University of Limerick Institutional Repository

Dataset construction for the detection of anti-social behaviour in online communication in arabic

DSpace Repository

Show simple item record

dc.contributor.author Alakrot, Azalden
dc.contributor.author Murray, Liam
dc.contributor.author Nikolov, Nikola S.
dc.date.accessioned 2019-06-10T11:27:53Z
dc.date.available 2019-06-10T11:27:53Z
dc.date.issued 2018
dc.identifier.uri http://hdl.handle.net/10344/7878
dc.description peer-reviewed en_US
dc.description.abstract Warning: this paper contains a range of words which may cause offence. In recent years, many studies target anti-social behaviour such as offensive language and cyberbullying in online communication. Typically, these studies collect data from various reachable sources, the majority of the datasets being in English. However, to the best of our knowledge, there is no dataset collected from the YouTube platform targeting Arabic text and overall there are only a few datasets of Arabic text, collected from other social platforms for the purpose of offensive language detection. Therefore, in this paper we contribute to this field by presenting a dataset of YouTube comments in Arabic, specifically designed to be used for the detection of offensive language in a machine learning scenario. Our dataset contains a range of offensive language and flaming in the form of YouTube comments. We document the labelling process we have conducted, taking into account the difference in the Arab dialects and the diversity of perception of offensive language throughout the Arab world. Furthermore, statistical analysis of the dataset is presented, in order to make it ready for use as a training dataset for predictive modelling. en_US
dc.language.iso eng en_US
dc.publisher Elsevier en_US
dc.relation.ispartofseries Procedia Computer Science;142 pp,174-181
dc.subject Anti-social behaviour online en_US
dc.subject offensive language en_US
dc.subject harassment detection en_US
dc.subject Arabic dataset en_US
dc.subject Arabic dialects en_US
dc.subject text mining en_US
dc.subject text classification en_US
dc.title Dataset construction for the detection of anti-social behaviour in online communication in arabic en_US
dc.type info:eu-repo/semantics/conferenceObject en_US
dc.type.supercollection all_ul_research en_US
dc.type.supercollection ul_published_reviewed en_US
dc.identifier.doi 10.1016/j.procs.2018.10.473
dc.rights.accessrights info:eu-repo/semantics/openAccess en_US
dc.internal.rssid 2905297


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search ULIR


Browse

My Account

Statistics