metadata
language:
- en
license: other
library_name: transformers
tags:
- orpo
- llama 3
- rlhf
- sft
datasets:
- mlabonne/orpo-dpo-mix-40k
OrpoLlama-3-8B
This is an ORPO fine-tune of meta-llama/Meta-Llama-3-8B on 1k samples of mlabonne/orpo-dpo-mix-40k created for this article.
It's a successful fine-tune that follows the ChatML template!
Try the demo: https://huggingface.co/spaces/mlabonne/OrpoLlama-3-8B
π Application
This model uses a context window of 8k. It was trained with the ChatML template.
π Evaluation
Nous
OrpoLlama-4-8B outperforms Llama-3-8B-Instruct on the GPT4All and TruthfulQA datasets.
Evaluation performed using LLM AutoEval, see the entire leaderboard here.
Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
---|---|---|---|---|---|
meta-llama/Meta-Llama-3-8B-Instruct π | 51.34 | 41.22 | 69.86 | 51.65 | 42.64 |
mlabonne/OrpoLlama-3-8B π | 48.63 | 34.17 | 70.59 | 52.39 | 37.36 |
mlabonne/OrpoLlama-3-8B-1k π | 46.76 | 31.56 | 70.19 | 48.11 | 37.17 |
meta-llama/Meta-Llama-3-8B π | 45.42 | 31.1 | 69.95 | 43.91 | 36.7 |
mlabonne/OrpoLlama-3-8B-1k
corresponds to a version of this model trained on 1K samples (you can see the parameters in this article).
Open LLM Leaderboard
TBD.
π Training curves
You can find the experiment on W&B at this address.
π» Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "mlabonne/OrpoLlama-3-8B"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])