Intent-GovMa-v1 / README.md
tferhan's picture
Update README.md
079d624 verified
---
library_name: transformers
tags:
- open data ma
- questions
- intents
- classification
- function calling
license: apache-2.0
language:
- fr
metrics:
- accuracy
pipeline_tag: text-classification
datasets:
- tferhan/Data_Gov_Ma_FAQ
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
This model is fine-tuned from the `camembert-base` model and is designed to classify user intent
questions for the website data.gov.ma in French. It can distinguish whether a user is making a general inquiry
or requesting specific data. The training data was generated using GPT-4o-mini and includes information specific
to data.gov.ma. The model was fine-tuned using LoRA with specific hyperparameters, achieving an accuracy of up to 0.98.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** TFERHAN
- **Language:** French
- **License:** Apache 2.0
- **Finetuned from model:** camembert-base
## Use Case
- **Purpose:** Classify user intent questions for the chatbot on the data.gov.ma website.
- **Languages:** French (optimized for), performs poorly on other languages.
- **Data Source:** Generated using GPT-4o-mini with data from data.gov.ma.
## Uses
### Direct Use
The model can be directly used to classify user intents in chatbot scenarios for the website data.gov.ma, distinguishing between general inquiries and data requests.
### Downstream Use
The model is particularly suited for applications involving the French language and can be integrated into larger chatbot systems or
fine-tuned further for similar tasks in different contexts.
### Out-of-Scope Use
- Misuse for different languages without fine-tuning.
- Applications that do not involve French language queries.
- Sensitive or highly critical applications without extensive validation.
## Bias, Risks, and Limitations
### Technical Limitations
- Performance may degrade significantly on languages other than French.
- Limited to intents related to general queries and data requests.
### Recommendations
- The model should be retrained or fine-tuned with appropriate data before deployment in non-French contexts.
- Continuous monitoring and evaluation should be conducted to ensure reliability and fairness.
## How to Get Started with the Model
Use the code snippet below to get started with the model:
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
import torch
from peft import AutoPeftModelForSequenceClassification
model_name = "tferhan/Intent-GovMa-v1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoPeftModelForSequenceClassification.from_pretrained(model_name)
nlp_pipeline = pipeline("text-classification", model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1)
questions = ["qu'est ce que open data", "je veux les informations de l'eau potable"]
results = nlp_pipeline_class(questions)
for result in results:
print(result)
#{'label': 'LABEL_0', 'score': 0.9999700784683228} === general
#{'label': 'LABEL_1', 'score': 0.9994990825653076} === request_data
```
## Training Details
### Training Data
- **Data Source:** Generated using GPT-4o-mini with help from words and data from data.gov.ma.
### Training Procedure
- **Preprocessing:**
- Standard text preprocessing steps - tokenization, text cleaning, and normalization.
- **Training Hyperparameters:**
- Epochs: `10`
- Train Batch Size: `4`
- Eval Batch Size: `4`
- Learning Rate: `2e-5`
- Evaluation Strategy: `epoch`
- Weight Decay: `0.01`
- **Log History:** `log_history.json`
## Evaluation
### Testing Data & Metrics
- **Testing Data:** Subset of the generated data based on data.gov.ma.
- **Evaluation Metrics:** Accuracy.
### Results
- **Maximum Accuracy:** 0.98%