---
library_name: transformers
tags:
- OpenData Morocco
- chatbot
- questions
- intents
- classification
- function calling
license: apache-2.0
language:
- fr
metrics:
- accuracy
pipeline_tag: text-classification
datasets:
- tferhan/Data_Gov_Ma_FAQ
base_model: almanach/camembert-base
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
This model is fine-tuned from the `camembert-base` model and is designed to classify user intent
questions for the website data.gov.ma in French. It can distinguish whether a user is making a general inquiry
or requesting specific data. The training data was generated using GPT-4o-mini and includes information specific
to data.gov.ma. The model was fine-tuned using LoRA with specific hyperparameters, achieving an accuracy of up to 0.98.


## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->


- **Developed by:** TFERHAN
- **Language:** French
- **License:** Apache 2.0
- **Finetuned from model:** camembert-base

## Use Case

- **Purpose:** Classify user intent questions for the chatbot on the data.gov.ma website.
- **Languages:** French (optimized for), performs poorly on other languages.
- **Data Source:** Generated using GPT-4o-mini with data from data.gov.ma.

## Uses

### Direct Use

The model can be directly used to classify user intents in chatbot scenarios for the website data.gov.ma, distinguishing between general inquiries and data requests.

### Downstream Use

The model is particularly suited for applications involving the French language and can be integrated into larger chatbot systems or
fine-tuned further for similar tasks in different contexts.

### Out-of-Scope Use

- Misuse for different languages without fine-tuning.
- Applications that do not involve French language queries.
- Sensitive or highly critical applications without extensive validation.

## Bias, Risks, and Limitations

### Technical Limitations

- Performance may degrade significantly on languages other than French.
- Limited to intents related to general queries and data requests.

### Recommendations

- The model should be retrained or fine-tuned with appropriate data before deployment in non-French contexts.
- Continuous monitoring and evaluation should be conducted to ensure reliability and fairness.

## How to Get Started with the Model

Use the code snippet below to get started with the model:

```python
from transformers import pipeline


model_name = "tferhan/finetuned_camb_intents"

nlp_pipeline = pipeline("text-classification", model_name)

questions = ["qu'est ce que open data", "je veux les informations de l'eau potable"]
results = nlp_pipeline_class(questions)

for result in results:
    print(result)

#{'label': 'LABEL_0', 'score': 0.9999700784683228} === general
#{'label': 'LABEL_1', 'score': 0.9994990825653076} === request_data
```

## Training Details

### Training Data

- **Data Source:** Generated using GPT-4o-mini with help from words and data from data.gov.ma.

### Training Procedure

- **Preprocessing:** 
  - Standard text preprocessing steps - tokenization, text cleaning, and normalization.
- **Training Hyperparameters:**
  - Epochs: `10`
  - Train Batch Size: `4`
  - Eval Batch Size: `4`
  - Learning Rate: `2e-5`
  - Evaluation Strategy: `epoch`
  - Weight Decay: `0.01`
- **Log History:** `log_history.json`

## Evaluation

### Testing Data & Metrics

- **Testing Data:** Subset of the generated data based on data.gov.ma.
- **Evaluation Metrics:** Accuracy.

### Results

- **Maximum Accuracy:** 0.98%