|
--- |
|
license: llama3 |
|
--- |
|
[Llama 3 8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) finetuned on [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) with [ORPO](https://arxiv.org/abs/2403.07691).\ |
|
Max length was reduced to 1024 tokens. LoRA (r=16) and 4bit quantization was used to increase memory efficiency. |
|
|
|
| **Benchmark** | **LLaMa 3 8B** | **LLaMa 3 8B Inst** | **LLaMa 3 8B ORPO V1** | **LLaMa 3 8B ORPO V2 (WIP)** | |
|
|--------------------|:-----------------:|:----------------:|:---------------:|----------------| |
|
| **MMLU** | 62.12 | 63.92 | 61.87 | | |
|
| **BoolQ** | 81.04 | 83.21 | 82.42 | | |
|
| **Winogrande** | 73.24 | 72.06 | 74.43 | | |
|
| **ARC-Challenge** | 53.24 | 56.91 | 52.90 | | |
|
| **TriviaQA** | 63.33 | 51.09 | 63.93 | | |
|
| **GSM-8K (flexible)** | 50.27 | 75.13 | 52.16 | | |
|
| **SQuAD V2 (f1)** | 32.48 | 29.68 | 33.68 | | |
|
| **LogiQA** | 29.23 | 32.87 | 30.26 | | |
|
All scores obtained with [lm-evaluation-harness v0.4.2](https://github.com/EleutherAI/lm-evaluation-harness) |
|
|
|
|
|
|