Thesis Open Access

A DEEP LEARNING BASED MULTILINGUAL MULTI-LABEL NEWS CLASSIFICATION

Mekdes Mulugeta


Citation Style Language JSON Export

{
  "DOI": "10.20372/nadre:10431", 
  "author": [
    {
      "family": "Mekdes Mulugeta"
    }
  ], 
  "issued": {
    "date-parts": [
      [
        2024, 
        9, 
        3
      ]
    ]
  }, 
  "abstract": "<p>Language is a fundamental aspect of human communication, enabling individuals to express thoughts, emotions, and ideas. The rise of communication technology has made electronic information widely accessible, positioning NLP as a prominent research area. Text classification is particularly active in NLP, especially for news articles. However, many languages worldwide remain under-resourced, with limited linguistic data available for research or technological development. Ethiopian languages, such as Amharic, Tigrigna, Afaan Oromo, and Somali, face significant challenges in multilingual, multilabel news classification an issue this research aims to address.</p>\n\n<p>Our research utilizes a unique dataset consisting of 14,361 manually annotated news articles spanning the categories of politics, sports, health, business, and technology, sourced from various news outlets. Before using the dataset, we applied several preprocessing tasks, including text cleaning, normalization, tokenization, and stop-word removal. We proposed the XLM-RoBERTa model, a deep learning approach, for both language identification and news classification.</p>\n\n<p>The rich contextual embeddings in XLM-RoBERTa improved our classification. In this study, we evaluate the performance of XLM-RoBERTa against other transformer models, including mBERT and DistilBERT. We evaluated the models using recall, precision, and F1-score matrices to measure their performance. The experimental results demonstrated that the XLM-RoBERTa model achieved better results than mBERT and DistilBERT, with F1-scores of 95.58% for Amharic, 92.44% for Afaan Oromo, 92.04% for Tigrigna, and 86.78% for Somali, respectively. In contrast, mBERT and DistilBERT yielded lower F1 scores for all languages, confirming the superior performance of XLM-RoBERTa.</p>\n\n<p>Our findings highlight the effectiveness of XLM-RoBERTa in handling low-resource languages through transfer learning. This research provides a robust solution for multilingual, multilabel text classification, particularly in resource-limited settings. For the future, we recommend using data balancing to further improve classification performance.</p>", 
  "title": "A DEEP LEARNING BASED MULTILINGUAL MULTI-LABEL NEWS CLASSIFICATION", 
  "type": "thesis", 
  "id": "10431"
}
0
0
views
downloads
All versions This version
Views 00
Downloads 00
Data volume 0 Bytes0 Bytes
Unique views 00
Unique downloads 00

Share

Cite as