Z3R6X
/

Llama-3-8B-ORPO-V1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3-8B-ORPO-V1 / README.md

Z3R6X's picture

Update README.md

1b68599 verified 7 months ago

|

history blame contribute delete

1.03 kB

	---
	license: llama3
	---
	[Llama 3 8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) finetuned on [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) with [ORPO](https://arxiv.org/abs/2403.07691).\
	Max length was reduced to 1024 tokens. LoRA (r=16) and 4bit quantization was used to increase memory efficiency.

	\| Benchmark \| LLaMa 3 8B \| LLaMa 3 8B Inst \| LLaMa 3 8B ORPO V1 \| LLaMa 3 8B ORPO V2 (WIP) \|
	\|--------------------\|:-----------------:\|:----------------:\|:---------------:\|----------------\|
	\| MMLU \| 62.12 \| 63.92 \| 61.87 \| \|
	\| BoolQ \| 81.04 \| 83.21 \| 82.42 \| \|
	\| Winogrande \| 73.24 \| 72.06 \| 74.43 \| \|
	\| ARC-Challenge \| 53.24 \| 56.91 \| 52.90 \| \|
	\| TriviaQA \| 63.33 \| 51.09 \| 63.93 \| \|
	\| GSM-8K (flexible) \| 50.27 \| 75.13 \| 52.16 \| \|
	\| SQuAD V2 (f1) \| 32.48 \| 29.68 \| 33.68 \| \|
	\| LogiQA \| 29.23 \| 32.87 \| 30.26 \| \|
	All scores obtained with [lm-evaluation-harness v0.4.2](https://github.com/EleutherAI/lm-evaluation-harness)