File size: 3,843 Bytes
e5d3273
 
a70d0eb
f0f8fd2
a70d0eb
 
 
 
 
 
 
 
 
 
25b1066
 
e5d3273
 
 
 
 
a70d0eb
 
 
 
e5d3273
 
 
 
 
 
 
 
 
 
a70d0eb
17ed7cf
a70d0eb
17ed7cf
e5d3273
a70d0eb
e5d3273
a70d0eb
 
 
e5d3273
 
 
 
 
a70d0eb
e5d3273
a70d0eb
e5d3273
a70d0eb
 
e5d3273
 
 
a70d0eb
 
 
e5d3273
 
 
a70d0eb
e5d3273
a70d0eb
 
e5d3273
 
 
a70d0eb
 
e5d3273
 
 
a70d0eb
e5d3273
a70d0eb
079d624
a70d0eb
079d624
 
e5d3273
a70d0eb
e5d3273
a70d0eb
079d624
 
 
 
 
 
 
 
 
 
 
a70d0eb
e5d3273
a70d0eb
e5d3273
a70d0eb
e5d3273
a70d0eb
e5d3273
a70d0eb
e5d3273
a70d0eb
 
 
 
 
 
 
 
 
 
e5d3273
 
 
a70d0eb
e5d3273
a70d0eb
 
e5d3273
 
 
25b1066
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
library_name: transformers
tags:
- open data ma
- questions
- intents
- classification
- function calling
license: apache-2.0
language:
- fr
metrics:
- accuracy
pipeline_tag: text-classification
datasets:
- tferhan/Data_Gov_Ma_FAQ
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
This model is fine-tuned from the `camembert-base` model and is designed to classify user intent
questions for the website data.gov.ma in French. It can distinguish whether a user is making a general inquiry
or requesting specific data. The training data was generated using GPT-4o-mini and includes information specific
to data.gov.ma. The model was fine-tuned using LoRA with specific hyperparameters, achieving an accuracy of up to 0.98.


## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Developed by:** TFERHAN
- **Language:** French
- **License:** Apache 2.0
- **Finetuned from model:** camembert-base

## Use Case

- **Purpose:** Classify user intent questions for the chatbot on the data.gov.ma website.
- **Languages:** French (optimized for), performs poorly on other languages.
- **Data Source:** Generated using GPT-4o-mini with data from data.gov.ma.

## Uses

### Direct Use

The model can be directly used to classify user intents in chatbot scenarios for the website data.gov.ma, distinguishing between general inquiries and data requests.

### Downstream Use

The model is particularly suited for applications involving the French language and can be integrated into larger chatbot systems or
fine-tuned further for similar tasks in different contexts.

### Out-of-Scope Use

- Misuse for different languages without fine-tuning.
- Applications that do not involve French language queries.
- Sensitive or highly critical applications without extensive validation.

## Bias, Risks, and Limitations

### Technical Limitations

- Performance may degrade significantly on languages other than French.
- Limited to intents related to general queries and data requests.

### Recommendations

- The model should be retrained or fine-tuned with appropriate data before deployment in non-French contexts.
- Continuous monitoring and evaluation should be conducted to ensure reliability and fairness.

## How to Get Started with the Model

Use the code snippet below to get started with the model:

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
import torch
from peft import AutoPeftModelForSequenceClassification


model_name = "tferhan/Intent-GovMa-v1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoPeftModelForSequenceClassification.from_pretrained(model_name)
nlp_pipeline = pipeline("text-classification", model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1)

questions = ["qu'est ce que open data", "je veux les informations de l'eau potable"]
results = nlp_pipeline_class(questions)

for result in results:
    print(result)

#{'label': 'LABEL_0', 'score': 0.9999700784683228} === general
#{'label': 'LABEL_1', 'score': 0.9994990825653076} === request_data
```

## Training Details

### Training Data

- **Data Source:** Generated using GPT-4o-mini with help from words and data from data.gov.ma.

### Training Procedure

- **Preprocessing:** 
  - Standard text preprocessing steps - tokenization, text cleaning, and normalization.
- **Training Hyperparameters:**
  - Epochs: `10`
  - Train Batch Size: `4`
  - Eval Batch Size: `4`
  - Learning Rate: `2e-5`
  - Evaluation Strategy: `epoch`
  - Weight Decay: `0.01`
- **Log History:** `log_history.json`

## Evaluation

### Testing Data & Metrics

- **Testing Data:** Subset of the generated data based on data.gov.ma.
- **Evaluation Metrics:** Accuracy.

### Results

- **Maximum Accuracy:** 0.98%