---
language:
- en
license: gpl
tags:
- autograding
- essay quetion
- sentence similarity
metrics:
- accuracy
library_name: peft
datasets:
- mohamedemam/Essay-quetions-auto-grading
---

# Model Card for Model ID

fine tuned version of Mistral on Essay-quetions-auto-grading


### Model Description

<!-- Provide a longer summary of what this model is. -->

We are thrilled to introduce our graduation project, the EM2 model, designed for automated essay grading in both Arabic and English. 📝✨

To develop this model, we first created a custom dataset for training. We adapted the QuAC and OpenOrca datasets to make them suitable for our automated essay grading application.

Our model utilizes the following impressive models:

Mistral: 96%
LLaMA: 93%
FLAN-T5: 93%
BLOOMZ (Arabic): 86%
MT0 (Arabic): 84%

You can try our models for auto-grading on Hugging Face! 🌐

We then deployed these models for practical use. We are proud of our team's hard work and the potential impact of the EM2 model in the field of education. 🌟

#MachineLearning #AI #Education #EssayGrading #GraduationProject

- **Developed by:** mohamed emam
- **Model type:** decoder only
- **Language(s) (NLP):** English
- **License:** gpl
- **Finetuned from model :** Mistral


<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/mohamed-em2m/Automatic-Grading-AI
- 
### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

auto grading for essay quetions

### Explain how it work 
- model take three inputs first context or perfect answer + quetion on context + student answer 
then model output the result

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6456f2eca9b8e1fd4cbe5ebe/_O75HT2zb2TYZOEkX4YXO.png)

### Training Data
- **mohamedemam/Essay-quetions-auto-grading-arabic**


### Training Procedure

using Trl
### Pipline
```python
from transformers import Pipeline
import torch.nn.functional as F


class MyPipeline:

    def __init__(self,model,tokenizer):
        self.model=model
        self.tokenizer=tokenizer

    def chat_Format(self,context, quetion, answer):
                        return "Instruction:/n check answer is true or false of next quetion using context below:\n" + "#context: " + context + f".\n#quetion: " + quetion + f".\n#student answer: " + answer + ".\n#response:"
                  

    def __call__(self, context, quetion, answer,generate=1,max_new_tokens=4, num_beams=2, do_sample=False,num_return_sequences=1):
                inp=self.chat_Format(context, quetion, answer)
                w = self.tokenizer(inp, add_special_tokens=True,
                                      pad_to_max_length=True,
                                      return_attention_mask=True,
                                      return_tensors='pt')
                response=""
                if(generate):
                    outputs = self.tokenizer.batch_decode(self.model.generate(input_ids=w['input_ids'].cuda(), attention_mask=w['attention_mask'].cuda(), max_new_tokens=max_new_tokens, num_beams=num_beams, do_sample=do_sample, num_return_sequences=num_return_sequences), skip_special_tokens=True)
                    response = outputs

                s =self.model(input_ids=w['input_ids'].cuda(), attention_mask=w['attention_mask'].cuda())['logits'][0][-1]
                s = F.softmax(s, dim=-1)
                yes_token_id = self.tokenizer.convert_tokens_to_ids(self.tokenizer.tokenize("True")[0])
                no_token_id = self.tokenizer.convert_tokens_to_ids(self.tokenizer.tokenize("False")[0])
                
                for i in  ["Yes", "yes", "True", "true","صحيح"]:
                  for word in self.tokenizer.tokenize(i): 
                    s[yes_token_id] += s[self.tokenizer.convert_tokens_to_ids(word)]
                for i in ["No", "no", "False", "false","خطأ"]:
                  for word in self.tokenizer.tokenize(i): 

                    s[no_token_id] += s[self.tokenizer.convert_tokens_to_ids(word)]
                true = (s[yes_token_id] / (s[no_token_id] + s[yes_token_id])).item()
                return {"response": response, "true": true}
context="""Large language models, such as GPT-4, are trained on vast amounts of text data to understand and generate human-like text. The deployment of these models involves several steps:

    Model Selection: Choosing a pre-trained model that fits the application's needs.
    Infrastructure Setup: Setting up the necessary hardware and software infrastructure to run the model efficiently, including cloud services, GPUs, and necessary libraries.
    Integration: Integrating the model into an application, which can involve setting up APIs or embedding the model directly into the software.
    Optimization: Fine-tuning the model for specific tasks or domains and optimizing it for performance and cost-efficiency.
    Monitoring and Maintenance: Ensuring the model performs well over time, monitoring for biases, and updating the model as needed.""" 
quetion="What are the key considerations when choosing a cloud service provider for deploying a large language model like GPT-4?"
answer="""When choosing a cloud service provider for deploying a large language model like GPT-4, the key considerations include:
    Compute Power: Ensure the provider offers high-performance GPUs or TPUs capable of handling the computational requirements of the model.
    Scalability: The ability to scale resources up or down based on the application's demand to handle varying workloads efficiently.
    Cost: Analyze the pricing models to understand the costs associated with compute time, storage, data transfer, and any other services.
    Integration and Support: Availability of tools and libraries that support easy integration of the model into your applications, along with robust technical support and documentation.
    Security and Compliance: Ensure the provider adheres to industry standards for security and compliance, protecting sensitive data and maintaining privacy.
    Latency and Availability: Consider the geographical distribution of data centers to ensure low latency and high availability for your end-users.

By evaluating these factors, you can select a cloud service provider that aligns with your deployment needs, ensuring efficient and cost-effective operation of your large language model."""
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM,AutoTokenizer

config = PeftConfig.from_pretrained("mohamedemam/Em2-llama-7b")
base_model = AutoModelForCausalLM.from_pretrained("NousResearch/Llama-2-7b-hf")
model = PeftModel.from_pretrained(base_model, "mohamedemam/Em2-llama-7b")
tokenizer = AutoTokenizer.from_pretrained("mohamedemam/Em2-llama-7b", trust_remote_code=True)
pipe=MyPipeline(model,tokenizer)
print(pipe(context,quetion,answer,generate=True,max_new_tokens=4, num_beams=2, do_sample=False,num_return_sequences=1))

```
- **output:**{'response': ["Instruction:/n check answer is true or false of next quetion using context below:\n#context: Large language models, such as GPT-4, are trained on vast amounts of text data to understand and generate human-like text. The deployment of these models involves several steps:\n\n    Model Selection: Choosing a pre-trained model that fits the application's needs.\n    Infrastructure Setup: Setting up the necessary hardware and software infrastructure to run the model efficiently, including cloud services, GPUs, and necessary libraries.\n    Integration: Integrating the model into an application, which can involve setting up APIs or embedding the model directly into the software.\n    Optimization: Fine-tuning the model for specific tasks or domains and optimizing it for performance and cost-efficiency.\n    Monitoring and Maintenance: Ensuring the model performs well over time, monitoring for biases, and updating the model as needed..\n#quetion: What are the key considerations when choosing a cloud service provider for deploying a large language model like GPT-4?.\n#student answer: When choosing a cloud service provider for deploying a large language model like GPT-4, the key considerations include:\n    Compute Power: Ensure the provider offers high-performance GPUs or TPUs capable of handling the computational requirements of the model.\n    Scalability: The ability to scale resources up or down based on the application's demand to handle varying workloads efficiently.\n    Cost: Analyze the pricing models to understand the costs associated with compute time, storage, data transfer, and any other services.\n    Integration and Support: Availability of tools and libraries that support easy integration of the model into your applications, along with robust technical support and documentation.\n    Security and Compliance: Ensure the provider adheres to industry standards for security and compliance, protecting sensitive data and maintaining privacy.\n    Latency and Availability: Consider the geographical distribution of data centers to ensure low latency and high availability for your end-users.\n\nBy evaluating these factors, you can select a cloud service provider that aligns with your deployment needs, ensuring efficient and cost-effective operation of your large language model..\n#response:  true the answer is"], 'true': 0.943033754825592}

### Chat Format Function
This function formats the input context, question, and answer into a specific structure for the model to process.

```python
def chat_Format(self, context, question, answer):
    return "Instruction:/n check answer is true or false of next question using context below:\n" + "#context: " + context + f".\n#question: " + question + f".\n#student answer: " + answer + ".\n#response:"
```


## Configuration

### Dropout Probability for LoRA Layers
- **lora_dropout:** 0.05

### Quantization Settings
- **use_4bit:** True
- **bnb_4bit_compute_dtype:** "float16"
- **bnb_4bit_quant_type:** "nf4"
- **use_nested_quant:** False

### Output Directory
- **output_dir:** "./results"

### Training Parameters
- **num_train_epochs:** 1
- **fp16:** False
- **bf16:** False
- **per_device_train_batch_size:** 1
- **per_device_eval_batch_size:** 4
- **gradient_accumulation_steps:** 8
- **gradient_checkpointing:** True
- **max_grad_norm:** 0.3
- **learning_rate:** 5e-5
- **weight_decay:** 0.001
- **optim:** "paged_adamw_8bit"
- **lr_scheduler_type:** "constant"
- **max_steps:** -1
- **warmup_ratio:** 0.03
- **group_by_length:** True

### Logging and Saving
- **save_steps:** 100
- **logging_steps:** 25
- **max_seq_length:** False