OrpoLlama-3.2-1B-V1 / README.md
bhuvana-ak7's picture
Update README.md
ba25654 verified
|
raw
history blame
1.69 kB
metadata
library_name: transformers
tags: []

Model Card for Model ID

This model is a fine-tuned version of meta-llama/Llama-3.2-1B, using ORPO (Optimized Regularization for Prompt Optimization) Trainer. This model is fine-tuned using the mlabonne/orpo-dpo-mix-40k dataset. Only 1000 data samples were used to train quickly using ORPO.

Model Details

Model Description

The base model meta-llama/Llama-3.2-1B has been fine-tuned using ORPO on a few samples of mlabonne/orpo-dpo-mix-40k dataset. The Llama 3.2 instruction-tuned text-only model is optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. This fine-tuned version is aimed at improving the understanding of the context in prompts and thereby increasing the interpretability of the model.

  • Finetuned from model [meta-llama/Llama-3.2-1B]
  • Model Size: 1 Billion parameters
  • Fine-tuning Method: ORPO
  • Dataset: mlabonne/orpo-dpo-mix-40k

Evaluation

The model was evaluated on the following benchmarks, with the following performance metrics:

Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc 0.4772 ± 0.0050
none 0 acc_norm 0.6366 ± 0.0048
tinyMMLU 0 none 0 acc_norm 0.4306 ± N/A
eq_bench 2.1 none 0 eqbench -12.9709 ± 2.9658
none 0 percent_parseable 92.9825 ± 1.9592