File size: 3,798 Bytes
5b49826
 
 
 
 
 
c5b0116
5b49826
333202c
5b49826
6438f7b
 
3fcaa9f
d611e86
6438f7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
language:
- en
license: llama3
---

# Llama-3-Instruct-8B-SimPO-ExPO

The extrapolated (ExPO) model based on [`princeton-nlp/Llama-3-Instruct-8B-SimPO`](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SimPO) and [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), as in the "[Weak-to-Strong Extrapolation Expedites Alignment](https://arxiv.org/abs/2404.16792)" paper.

Specifically, we obtain this model by extrapolating **(alpha = 0.3)** from the weights of the SFT and DPO/RLHF checkpoints, achieving superior alignment with human preference.

This extrapolated model achieves the **40.6%** win rate and **45.8%** LC win rate on **AlpacaEval 2.0**, outperforming the original `Llama-3-Instruct-8B-SimPO`'s 40.5% and 44.7%, respectively.

## Evaluation Results

Evaluation results on the **AlpacaEval 2.0** benchmark (you can find the evaluation outputs on the [official GitHub repo](https://github.com/chujiezheng/LLM-Extrapolation/tree/main/results_alpaca)):

|                                      | Win Rate (Ori) | LC Win Rate (Ori) | Win Rate (+ ExPO) | LC Win Rate (+ ExPO) |
| ------------------------------------ | -------------- | ----------------- | ----------------- | -------------------- |
| `HuggingFaceH4/zephyr-7b-alpha`      | 6.7%           | 10.0%             | **10.6%**         | **13.6%**            |
| `HuggingFaceH4/zephyr-7b-beta`       | 10.2%          | 13.2%             | **11.1%**         | **14.0%**            |
| `berkeley-nest/Starling-LM-7B-alpha` | 15.0%          | 18.3%             | **18.2%**         | **19.5%**            |
| `Nexusflow/Starling-LM-7B-beta`      | 26.6%          | 25.8%             | **29.6%**         | **26.4%**            |
| `snorkelai/Snorkel-Mistral-PairRM`   | 24.7%          | 24.0%             | **28.8%**         | **26.4%**            |
| `RLHFlow/LLaMA3-iterative-DPO-final` | 29.2%          | 36.0%             | **32.7%**         | **37.8%**            |
| `internlm/internlm2-chat-1.8b`       | 3.8%           | 4.0%              | **5.2%**          | **4.3%**             |
| `internlm/internlm2-chat-7b`         | 20.5%          | 18.3%             | **28.1%**         | **22.7%**            |
| `internlm/internlm2-chat-20b`        | 36.1%          | 24.9%             | **46.2%**         | **27.2%**            |
| `allenai/tulu-2-dpo-7b`              | 8.5%           | 10.2%             | **11.5%**         | **11.7%**            |
| `allenai/tulu-2-dpo-13b`             | 11.2%          | 15.5%             | **15.6%**         | **17.6%**            |
| `allenai/tulu-2-dpo-70b`             | 15.4%          | 21.2%             | **23.0%**         | **25.7%**            |

Evaluation results on the **MT-Bench** benchmark (you can find the evaluation outputs on the [official GitHub repo](https://github.com/chujiezheng/LLM-Extrapolation/tree/main/results_mtbench)):

|                                      | Original | + ExPO   |
| ------------------------------------ | -------- | -------- |
| `HuggingFaceH4/zephyr-7b-alpha`      | 6.85     | **6.87** |
| `HuggingFaceH4/zephyr-7b-beta`       | 7.02     | **7.06** |
| `berkeley-nest/Starling-LM-7B-alpha` | 7.82     | **7.91** |
| `Nexusflow/Starling-LM-7B-beta`      | 8.10     | **8.18** |
| `snorkelai/Snorkel-Mistral-PairRM`   | 7.63     | **7.69** |
| `RLHFlow/LLaMA3-iterative-DPO-final` | 8.08     | **8.45** |
| `internlm/internlm2-chat-1.8b`       | 5.17     | **5.26** |
| `internlm/internlm2-chat-7b`         | 7.72     | **7.80** |
| `internlm/internlm2-chat-20b`        | 8.13     | **8.26** |
| `allenai/tulu-2-dpo-7b`              | 6.35     | **6.38** |
| `allenai/tulu-2-dpo-13b`             | 7.00     | **7.26** |
| `allenai/tulu-2-dpo-70b`             | 7.79     | **8.03** |