Edit model card

Model Card for Model ID

This model is a fine-tuned version of meta-llama/Llama-3.2-1B, using ORPO (Optimized Regularization for Prompt Optimization) Trainer. This model is fine-tuned using the mlabonne/orpo-dpo-mix-40k dataset. Only 1000 data samples were used to train quickly using ORPO.

Model Details

Model Description

The base model meta-llama/Llama-3.2-1B has been fine-tuned using ORPO on a few samples of mlabonne/orpo-dpo-mix-40k dataset. The Llama 3.2 instruction-tuned text-only model is optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. This fine-tuned version is aimed at improving the understanding of the context in prompts and thereby increasing the interpretability of the model.

  • Finetuned from model [meta-llama/Llama-3.2-1B]
  • Model Size: 1 Billion parameters
  • Fine-tuning Method: ORPO
  • Dataset: mlabonne/orpo-dpo-mix-40k

Evaluation

The model was evaluated on the following benchmarks, with the following performance metrics:

Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc ↑ 0.4772 ± 0.0050
none 0 acc_norm ↑ 0.6366 ± 0.0048
tinyMMLU 0 none 0 acc_norm ↑ 0.4306 ± N/A
eq_bench 2.1 none 0 eqbench ↑ -12.9709 ± 2.9658
none 0 percent_parseable ↑ 92.9825 ± 1.9592
Downloads last month
39
Safetensors
Model size
1.24B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.