OrpoLlama-3.2-1B-V1 / README.md
bhuvana-ak7's picture
Update README.md
ba25654 verified
---
library_name: transformers
tags: []
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
This model is a fine-tuned version of meta-llama/Llama-3.2-1B, using ORPO (Optimized Regularization for Prompt Optimization) Trainer.
This model is fine-tuned using the mlabonne/orpo-dpo-mix-40k dataset.
Only 1000 data samples were used to train quickly using ORPO.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
The base model meta-llama/Llama-3.2-1B has been fine-tuned using ORPO on a few samples of mlabonne/orpo-dpo-mix-40k dataset.
The Llama 3.2 instruction-tuned text-only model is optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.
This fine-tuned version is aimed at improving the understanding of the context in prompts and thereby increasing the interpretability of the model.
- **Finetuned from model [meta-llama/Llama-3.2-1B]**
- **Model Size: 1 Billion parameters**
- **Fine-tuning Method: ORPO**
- **Dataset: mlabonne/orpo-dpo-mix-40k**
## Evaluation
The model was evaluated on the following benchmarks, with the following performance metrics:
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.4772|± |0.0050|
| | |none | 0|acc_norm|↑ |0.6366|± |0.0048|
|tinyMMLU| 0|none | 0|acc_norm|↑ |0.4306|± | N/A|
|eq_bench| 2.1|none | 0|eqbench |↑ |-12.9709|± |2.9658|
| | |none | 0|percent_parseable|↑ | 92.9825|± |1.9592|