--- library_name: transformers tags: - OpenData Morocco - chatbot - questions - intents - classification - function calling license: apache-2.0 language: - fr metrics: - accuracy pipeline_tag: text-classification datasets: - tferhan/Data_Gov_Ma_FAQ base_model: almanach/camembert-base --- # Model Card for Model ID This model is fine-tuned from the `camembert-base` model and is designed to classify user intent questions for the website data.gov.ma in French. It can distinguish whether a user is making a general inquiry or requesting specific data. The training data was generated using GPT-4o-mini and includes information specific to data.gov.ma. The model was fine-tuned using LoRA with specific hyperparameters, achieving an accuracy of up to 0.98. ## Model Details ### Model Description - **Developed by:** TFERHAN - **Language:** French - **License:** Apache 2.0 - **Finetuned from model:** camembert-base ## Use Case - **Purpose:** Classify user intent questions for the chatbot on the data.gov.ma website. - **Languages:** French (optimized for), performs poorly on other languages. - **Data Source:** Generated using GPT-4o-mini with data from data.gov.ma. ## Uses ### Direct Use The model can be directly used to classify user intents in chatbot scenarios for the website data.gov.ma, distinguishing between general inquiries and data requests. ### Downstream Use The model is particularly suited for applications involving the French language and can be integrated into larger chatbot systems or fine-tuned further for similar tasks in different contexts. ### Out-of-Scope Use - Misuse for different languages without fine-tuning. - Applications that do not involve French language queries. - Sensitive or highly critical applications without extensive validation. ## Bias, Risks, and Limitations ### Technical Limitations - Performance may degrade significantly on languages other than French. - Limited to intents related to general queries and data requests. ### Recommendations - The model should be retrained or fine-tuned with appropriate data before deployment in non-French contexts. - Continuous monitoring and evaluation should be conducted to ensure reliability and fairness. ## How to Get Started with the Model Use the code snippet below to get started with the model: ```python from transformers import pipeline model_name = "tferhan/finetuned_camb_intents" nlp_pipeline = pipeline("text-classification", model_name) questions = ["qu'est ce que open data", "je veux les informations de l'eau potable"] results = nlp_pipeline_class(questions) for result in results: print(result) #{'label': 'LABEL_0', 'score': 0.9999700784683228} === general #{'label': 'LABEL_1', 'score': 0.9994990825653076} === request_data ``` ## Training Details ### Training Data - **Data Source:** Generated using GPT-4o-mini with help from words and data from data.gov.ma. ### Training Procedure - **Preprocessing:** - Standard text preprocessing steps - tokenization, text cleaning, and normalization. - **Training Hyperparameters:** - Epochs: `10` - Train Batch Size: `4` - Eval Batch Size: `4` - Learning Rate: `2e-5` - Evaluation Strategy: `epoch` - Weight Decay: `0.01` - **Log History:** `log_history.json` ## Evaluation ### Testing Data & Metrics - **Testing Data:** Subset of the generated data based on data.gov.ma. - **Evaluation Metrics:** Accuracy. ### Results - **Maximum Accuracy:** 0.98%