Hailay
/

FT_EXLMR

Text Classification

Model card Files Files and versions Community

FT_EXLMR / README.md

Hailay's picture

Update README.md

cabb2bb verified about 1 month ago

|

history blame contribute delete

2.87 kB

	---
	license: apache-2.0
	language:
	- am
	- ti
	- ha
	- aa
	base_model:
	- Hailay/EXLMR
	- FacebookAI/xlm-roberta-base
	pipeline_tag: text-classification
	---
	---
	## 1. Model Description
	Hailay/FT_EXLMR is a fine-tuned version of the EXLMR model, designed specifically for sentiment analysis and text classification tasks in low-resource African languages such as Tigrinya, Amharic, and Oromo. This model leverages the architecture of EXLMR but has been further fine-tuned to improve its performance on multilingual tasks, especially for languages not widely represented in existing NLP models.
	The model was trained using the AfriSent-Semeval-2023 dataset, a benchmark dataset for African languages, which is publicly available on GitHub:[AfriSent-Semeval-2023 GitHub Repository](https://github.com/afrisenti-semeval/afrisent-semeval-2023)

	## 2.Intended Use
	This model is ideal for:
	Researchers and developers who are working on multilingual sentiment analysis in African languages.
	Applications that require text classification in low-resource languages.
	It is designed specifically for tasks such as:
	Sentiment analysis
	Text classification

	Note: Without further fine-tuning, the model is unsuitable for tasks like machine translation or named entity recognition.

	## 3.Training Data
	The Hailay/FT_EXLMR model was trained using the dataset from the
	SemEval 2023 Shared Task 12: Sentiment Analysis in African Languages (AfriSenti-SemEval).
	This dataset comprises sentiment-labeled text from 14 African languages:

	1. Algerian Arabic (arq) - Algeria
	2. Amharic (ama) - Ethiopia
	3. Hausa (hau) - Nigeria
	4. Igbo (ibo) - Nigeria
	5. Kinyarwanda (kin) - Rwanda
	6. Moroccan Arabic/Darija (ary) - Morocco
	7. Mozambique Portuguese (pt-MZ) - Mozambique
	8. Nigerian Pidgin (pcm) - Nigeria
	9. Oromo (orm) - Ethiopia
	10. Swahili (swa) - Kenya/Tanzania
	11. Tigrinya (tir) - Ethiopia
	12. Twi (twi) - Ghana
	13. Xithonga (tso) - Mozambique
	14. Yoruba (yor) - Nigeria

	The dataset covers diverse data for training multilingual models like Hailay/FT_EXLMR
	We access the dataset from [AfriSent-Semeval-2023 GitHub Repository](https://github.com/afrisenti-semeval/afrisent-semeval-2023).
	The Hailay/FT_EXLMR model was trained using the following configuration:
	Epochs: 3
	Learning Rate: 1e-5
	Optimizer: AdamW
	Batch Size: 16

	## 4. Evaluation

	The model was evaluated using accuracy and loss as the primary metrics. The results are as follows:

	Accuracy: Achieved strong performance on Tigrinya, Amharic, Afar, and Oromo text classification and sentiment analysis tasks.

	Loss: Loss values showed steady convergence during the 3 epochs of training, reflecting a well-calibrated model.
	The evaluation was carried out on the test set provided in the [AfriSent-Semeval-2023 GitHub Repository](https://github.com/afrisenti-semeval/afrisent-semeval-2023) dataset.