A DEEP LEARNING BASED MULTILINGUAL MULTI-LABEL NEWS CLASSIFICATION

Mekdes Mulugeta

doi:10.20372/nadre:10431

September 3, 2024 Thesis Open Access

A DEEP LEARNING BASED MULTILINGUAL MULTI-LABEL NEWS CLASSIFICATION

Mekdes Mulugeta

Citation Style Language JSON Export

{
  "DOI": "10.20372/nadre:10431", 
  "author": [
    {
      "family": "Mekdes Mulugeta"
    }
  ], 
  "issued": {
    "date-parts": [
      [
        2024, 
        9, 
        3
      ]
    ]
  }, 
  "abstract": "<p>Language is a fundamental aspect of human communication, enabling individuals to express thoughts, emotions, and ideas. The rise of communication technology has made electronic information widely accessible, positioning NLP as a prominent research area. Text classification is particularly active in NLP, especially for news articles. However, many languages worldwide remain under-resourced, with limited linguistic data available for research or technological development. Ethiopian languages, such as Amharic, Tigrigna, Afaan Oromo, and Somali, face significant challenges in multilingual, multilabel news classification an issue this research aims to address.</p>\n\n<p>Our research utilizes a unique dataset consisting of 14,361 manually annotated news articles spanning the categories of politics, sports, health, business, and technology, sourced from various news outlets. Before using the dataset, we applied several preprocessing tasks, including text cleaning, normalization, tokenization, and stop-word removal. We proposed the XLM-RoBERTa model, a deep learning approach, for both language identification and news classification.</p>\n\n<p>The rich contextual embeddings in XLM-RoBERTa improved our classification. In this study, we evaluate the performance of XLM-RoBERTa against other transformer models, including mBERT and DistilBERT. We evaluated the models using recall, precision, and F1-score matrices to measure their performance. The experimental results demonstrated that the XLM-RoBERTa model achieved better results than mBERT and DistilBERT, with F1-scores of 95.58% for Amharic, 92.44% for Afaan Oromo, 92.04% for Tigrigna, and 86.78% for Somali, respectively. In contrast, mBERT and DistilBERT yielded lower F1 scores for all languages, confirming the superior performance of XLM-RoBERTa.</p>\n\n<p>Our findings highlight the effectiveness of XLM-RoBERTa in handling low-resource languages through transfer learning. This research provides a robust solution for multilingual, multilabel text classification, particularly in resource-limited settings. For the future, we recommend using data balancing to further improve classification performance.</p>", 
  "title": "A DEEP LEARNING BASED MULTILINGUAL MULTI-LABEL NEWS CLASSIFICATION", 
  "type": "thesis", 
  "id": "10431"
}

views

downloads

See more details...

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes
Unique views	0	0
Unique downloads	0	0

More info on how stats are collected.

Indexed in

Publication date:

September 3, 2024

DOI:

Keyword(s):

Multilingual, Text classification, XLM-Roberta

License (for files):

Creative Commons Attribution

Versions

Version 1 10.20372/nadre:10431

Sep 3, 2024

Cite all versions? You can cite all versions by using the DOI 10.20372/nadre:10430. This DOI represents all versions, and will always resolve to the latest one. Read more.

A DEEP LEARNING BASED MULTILINGUAL MULTI-LABEL NEWS CLASSIFICATION

Citation Style Language JSON Export

Versions

Share

Cite as

Export

About

Developers

Contribute

A DEEP LEARNING BASED MULTILINGUAL MULTI-LABEL NEWS CLASSIFICATION

Citation Style Language JSON Export

Zenodo DOI Badge

DOI

10.20372/nadre:10431

Markdown

[![DOI](https://nadre.ethernet.edu.et/badge/DOI/10.20372/nadre:10431.svg)](https://doi.org/10.20372/nadre:10431)

reStructedText

.. image:: https://nadre.ethernet.edu.et/badge/DOI/10.20372/nadre:10431.svg :target: https://doi.org/10.20372/nadre:10431

HTML

<a href="https://doi.org/10.20372/nadre:10431"><img src="https://nadre.ethernet.edu.et/badge/DOI/10.20372/nadre:10431.svg" alt="DOI"></a>

Image URL

https://nadre.ethernet.edu.et/badge/DOI/10.20372/nadre:10431.svg

Target URL

https://doi.org/10.20372/nadre:10431

Versions

Share

Cite as

Export