--- library_name: transformers tags: [] --- # Model Card for Model ID This model is a fine-tuned version of meta-llama/Llama-3.2-1B, using ORPO (Optimized Regularization for Prompt Optimization) Trainer. This model is fine-tuned using the mlabonne/orpo-dpo-mix-40k dataset. Only 1000 data samples were used to train quickly using ORPO. ## Model Details ### Model Description The base model meta-llama/Llama-3.2-1B has been fine-tuned using ORPO on a few samples of mlabonne/orpo-dpo-mix-40k dataset. The Llama 3.2 instruction-tuned text-only model is optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. This fine-tuned version is aimed at improving the understanding of the context in prompts and thereby increasing the interpretability of the model. - **Finetuned from model [meta-llama/Llama-3.2-1B]** - **Model Size: 1 Billion parameters** - **Fine-tuning Method: ORPO** - **Dataset: mlabonne/orpo-dpo-mix-40k** ## Evaluation The model was evaluated on the following benchmarks, with the following performance metrics: | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |---------|------:|------|-----:|--------|---|-----:|---|-----:| |hellaswag| 1|none | 0|acc |↑ |0.4772|± |0.0050| | | |none | 0|acc_norm|↑ |0.6366|± |0.0048| |tinyMMLU| 0|none | 0|acc_norm|↑ |0.4306|± | N/A| |eq_bench| 2.1|none | 0|eqbench |↑ |-12.9709|± |2.9658| | | |none | 0|percent_parseable|↑ | 92.9825|± |1.9592|