Thesis Open Access

OFFENSIVE SPEECH DETECTION FOR AFAAN OROMOO LANGUAGE ON SOCIAL MEDIA USING SUPERVISED MACHINE LEARNING ALGORITHM

Debelo Negeri Begna


Citation Style Language JSON Export

{
  "DOI": "10.20372/nadre:15096", 
  "author": [
    {
      "family": "Debelo Negeri Begna"
    }
  ], 
  "issued": {
    "date-parts": [
      [
        2022, 
        7, 
        31
      ]
    ]
  }, 
  "abstract": "<p>Major Advisor: Kasahun Abdisa (Ph.D. Candidate)</p>\n\n<p>As offensive speech has become a controversial issue for online communities and social media platforms. Related to this, researchers have been investigating ways of coping with offensive content and developing systems to detect its different types like: cyberbullying, hate speech, aggression, etc. To the best of our knowledge and from what we have reviewed, most researches on this topic so far have dealt with English and other languages. This is mostly due to the availability of languages resources for those languages. To address this gap, this paper presents the first Afaan Oromoo annotated dataset for offensive speech detection. This research aims to use a supervised machine learning algorithm to develop an offensive speech detection model for Afaan Oromoo texts on social media sites like Facebook. While doing so, we collected posts and comments from social media specifically from Facebook pages of BBC News Afaan Oromoo, FBC Afaan Oromoo, Political party, Politicians, Oromia Communication Bureau, and Public figure artist pages. For this study, 3740 statements were collected using Facepager tool and labeled with binary classes namely Non-offensive speech 1922, and offensive speech 1818. To remove irrelevant characters like punctuations, symbols, blank value, white space, stop words, and to perform tokenization, text preparation tasks were applied to the data. To find the best combination of supervised machine learning algorithm and feature extraction for the model, the researcher used an experimental approach. Logistic Regression (LR), Multinomial Na&iuml;ve Bayes (MNB), Random Forest (RF), and Linear Support Vector Classifier (LSVC) models were trained with the dataset and with the extracted feature based on word unigram, bigram, trigram, combined n-grams, TF-IDF, and combined n grams weighted by TF-IDF for the dataset. For model comparison LR, MNB, RF, and Linear SVC achieved highest score of 87.37%, 89.19%, 83.31%, and 87.16% respectively. The model was also evaluated using 10-fold cross-validation, and classification performance to compare the models performance. Finally, the performance of proposed model was also evaluated using accuracy score. The performance evaluation shows that Multinomial Na&iuml;ve Bayes scored the highest accuracy value of 89.19%. Keywords: Afaan Oromoo; Offensive; Offensive Speech Detection; Social Media; Machine Learning</p>", 
  "title": "OFFENSIVE SPEECH DETECTION FOR AFAAN OROMOO  LANGUAGE ON SOCIAL MEDIA USING SUPERVISED MACHINE  LEARNING ALGORITHM", 
  "type": "thesis", 
  "id": "15096"
}
0
0
views
downloads
All versions This version
Views 00
Downloads 00
Data volume 0 Bytes0 Bytes
Unique views 00
Unique downloads 00

Share

Cite as