|
--- |
|
library_name: transformers |
|
tags: |
|
- open data ma |
|
- questions |
|
- intents |
|
- classification |
|
- function calling |
|
license: apache-2.0 |
|
language: |
|
- fr |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-classification |
|
datasets: |
|
- tferhan/Data_Gov_Ma_FAQ |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
This model is fine-tuned from the `camembert-base` model and is designed to classify user intent |
|
questions for the website data.gov.ma in French. It can distinguish whether a user is making a general inquiry |
|
or requesting specific data. The training data was generated using GPT-4o-mini and includes information specific |
|
to data.gov.ma. The model was fine-tuned using LoRA with specific hyperparameters, achieving an accuracy of up to 0.98. |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
|
|
- **Developed by:** TFERHAN |
|
- **Language:** French |
|
- **License:** Apache 2.0 |
|
- **Finetuned from model:** camembert-base |
|
|
|
## Use Case |
|
|
|
- **Purpose:** Classify user intent questions for the chatbot on the data.gov.ma website. |
|
- **Languages:** French (optimized for), performs poorly on other languages. |
|
- **Data Source:** Generated using GPT-4o-mini with data from data.gov.ma. |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
The model can be directly used to classify user intents in chatbot scenarios for the website data.gov.ma, distinguishing between general inquiries and data requests. |
|
|
|
### Downstream Use |
|
|
|
The model is particularly suited for applications involving the French language and can be integrated into larger chatbot systems or |
|
fine-tuned further for similar tasks in different contexts. |
|
|
|
### Out-of-Scope Use |
|
|
|
- Misuse for different languages without fine-tuning. |
|
- Applications that do not involve French language queries. |
|
- Sensitive or highly critical applications without extensive validation. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
### Technical Limitations |
|
|
|
- Performance may degrade significantly on languages other than French. |
|
- Limited to intents related to general queries and data requests. |
|
|
|
### Recommendations |
|
|
|
- The model should be retrained or fine-tuned with appropriate data before deployment in non-French contexts. |
|
- Continuous monitoring and evaluation should be conducted to ensure reliability and fairness. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code snippet below to get started with the model: |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline |
|
import torch |
|
from peft import AutoPeftModelForSequenceClassification |
|
|
|
|
|
model_name = "tferhan/Intent-GovMa-v1" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoPeftModelForSequenceClassification.from_pretrained(model_name) |
|
nlp_pipeline = pipeline("text-classification", model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1) |
|
|
|
questions = ["qu'est ce que open data", "je veux les informations de l'eau potable"] |
|
results = nlp_pipeline_class(questions) |
|
|
|
for result in results: |
|
print(result) |
|
|
|
#{'label': 'LABEL_0', 'score': 0.9999700784683228} === general |
|
#{'label': 'LABEL_1', 'score': 0.9994990825653076} === request_data |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
- **Data Source:** Generated using GPT-4o-mini with help from words and data from data.gov.ma. |
|
|
|
### Training Procedure |
|
|
|
- **Preprocessing:** |
|
- Standard text preprocessing steps - tokenization, text cleaning, and normalization. |
|
- **Training Hyperparameters:** |
|
- Epochs: `10` |
|
- Train Batch Size: `4` |
|
- Eval Batch Size: `4` |
|
- Learning Rate: `2e-5` |
|
- Evaluation Strategy: `epoch` |
|
- Weight Decay: `0.01` |
|
- **Log History:** `log_history.json` |
|
|
|
## Evaluation |
|
|
|
### Testing Data & Metrics |
|
|
|
- **Testing Data:** Subset of the generated data based on data.gov.ma. |
|
- **Evaluation Metrics:** Accuracy. |
|
|
|
### Results |
|
|
|
- **Maximum Accuracy:** 0.98% |