Model Card for Arabic Text Classification Model

This model classifies Arabic text into one of seven categories using FastText’s supervised learning method. It is particularly suitable for tasks requiring rapid text categorization in Arabic.

Model Details

Model Description

Developed by: [Tevfik İstanbullu]
Model type: [Supervised classification model using FastText embeddings]
Language(s) (NLP): [Arabic]
License: [Apache License 2.0d]

Model Sources [optional]

Repository: [https://huggingface.co/Tevfik34/arabic-text-classifier-fasttext]
Demo: []

Uses

Direct Use

This model is intended for direct use in text classification tasks within the Arabic language. It can be deployed in applications for organizing news articles, automating customer support categorization, or any other domain-specific categorization tasks.

Out-of-Scope Use

The model is not designed for tasks outside of the Arabic language or for multi-label classifications where multiple labels are assigned to a single text instance.

Bias, Risks, and Limitations

This model is trained on publicly available Arabic text data with specific categories (Finance, Sports, Politics, Medical, Tech, Culture, Religion). It may contain biases present in the original dataset and may not perform equally well on all Arabic dialects. Users should test the model in their specific applications to assess accuracy and suitability.

Recommendations

Users should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[```python import fasttext

Load the model

model = fasttext.load_model("path_to_your_model.bin")

Make predictions

label, prob = model.predict("Sample Arabic text") print(f"Predicted Label: {label[0]}, Probability: {prob[0]}")]

Training Details

Training Data

Data Size: 194,317 Arabic text samples Categories: 7 categories - Finance, Sports, Politics, Medical, Tech, Culture, Religion

Training Procedure

Embedding Dimension: 300
Epochs: 25
Learning Rate: 0.1
Word N-grams: 3
Min Count: 1 These parameters were selected to enhance the model’s ability to capture context in Arabic text and perform well across a diverse range of categories.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluated on a hold-out test set of 6,428 Arabic text samples, representing the seven categories.

Metrics

Accuracy: Measures overall model performance.
Precision: Reflects the relevancy of predicted categories.
Recall: Indicates the model's ability to identify relevant categories.
F1-score: The harmonic mean of precision and recall, balancing these two metrics.

Results

Precision: 96.20%
Recall: 95.40%
F1: 95.79%