Model Card for Model ID

This model is a fine-tuned version of meta-llama/Llama-3.2-1B, using ORPO (Optimized Regularization for Prompt Optimization) Trainer. This model is fine-tuned using the mlabonne/orpo-dpo-mix-40k dataset. Only 1000 data samples were used to train quickly using ORPO.

Model Details

Model Description

The base model meta-llama/Llama-3.2-1B has been fine-tuned using ORPO on a few samples of mlabonne/orpo-dpo-mix-40k dataset. The Llama 3.2 instruction-tuned text-only model is optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. This fine-tuned version is aimed at improving the understanding of the context in prompts and thereby increasing the interpretability of the model.

Finetuned from model [meta-llama/Llama-3.2-1B]
Model Size: 1 Billion parameters
Fine-tuning Method: ORPO
Dataset: mlabonne/orpo-dpo-mix-40k

Evaluation

The model was evaluated on the following benchmarks, with the following performance metrics:

Tasks	Version	Filter	Metric		Value		Stderr
hellaswag	1	none	acc	↑	0.4772	±	0.0050
		none	acc_norm	↑	0.6366	±	0.0048
tinyMMLU	0	none	acc_norm	↑	0.4306	±	N/A
eq_bench	2.1	none	eqbench	↑	-12.9709	±	2.9658
		none	percent_parseable	↑	92.9825	±	1.9592