Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -18,9 +18,10 @@ base_model: mistralai/Mistral-7B-v0.1
18
  # Model Card for Merlinite-7B-pt 🔥
19
 
20
  ### Overview
21
- We introduce **Merlinite-7B-pt**, a strong open-source chat model, aligned using AI feedback **without using any human annotation or proprietary model**.
22
-
23
- **Merlinite-7B-pt** is first supervised-finetuned (SFT) via [LAB](https://arxiv.org/abs/2403.01081) using Mistral-7B-v0.1 as base model, and then preference-tuned via AI feedback. Our preference tuning recipe uses the DPO reward from Mixtral-8x7B-Instruct-v0.1 as the proxy for human preferences, and applies iterative rejection sampling to finetune the SFT policy. We show that DPO log-ratios can serve as a reliable reward signal, showing clear correlation between reward improvements and Mt-Bench improvements.
 
24
 
25
  The official **Merlinite-7B-pt** achieves **7.96** on MT-Bench, surpassing Mistral-7B-Instruct-v0.1, Llama2-70b-chat and comparable to small-sized proprietary models like GPT3.5-Turbo-0314 and Claude-v1. It also exhibits superior instruction-following and human preference compared to the SFT Merlinite-7B model.
26
 
@@ -39,14 +40,14 @@ The official **Merlinite-7B-pt** achieves **7.96** on MT-Bench, surpassing Mistr
39
  | [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | SFT/DPO | Mistral-7B-v0.1 | GPT-4 | 7.34 | 61.07 | 63.74 | 84.19 | 78.06 | 34.04 |
40
  | [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) | SFT | Mistral-7B-v0.1 | - | 7.6** | 60.78 | 63.14 | 84.88 | 77.19 | 40.03 |
41
  | Merlinite-7b | Large-scale Alignment for chatBots (LAB) | Mistral-7B-v0.1 | Mixtral-8x7B-Instruct | 7.66 | 64.88 | 63.99 | 84.37 | 78.24 | 44.58 |
42
- | Merlinite-7b-pt | LAB + RLAIF | Mistral-7B-v0.1 | Mixtral-8x7B-Instruct | 7.96 | 63.59 | 64.50 | 84.28 | 79.72 | 48.67 |
43
 
44
 
45
  [*] Numbers for models other than Merlinite-7b, Merlinite-7b-pt and [Labradorite-13b](https://huggingface.co/ibm/labradorite-13b) (ours) are taken from [lmsys/chatbot-arena-leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)
46
 
47
  [**] Numbers taken from [MistralAI Release Blog](https://mistral.ai/news/la-plateforme/)
48
 
49
- [**] Merlinite-7b-pt model exhibits variability on the MT-Bench evaluation. The 5-run average score is 7.85, with highest 7.96 and lowest score 7.80.
50
 
51
  ### Method
52
 
 
18
  # Model Card for Merlinite-7B-pt 🔥
19
 
20
  ### Overview
21
+ We introduce **Merlinite-7B-pt**, a strong open-source chat model, aligned using AI feedback **without proprietary models or using any human annotation**.
22
+ - **Merlinite-7B-pt** is first supervised-finetuned (SFT) via [LAB](https://arxiv.org/abs/2403.01081) using Mistral-7B-v0.1 as base model, and then preference-tuned via AI feedback.
23
+ - Our preference tuning recipe uses the DPO reward from Mixtral-8x7B-Instruct-v0.1 as the proxy for human preferences, and applies iterative rejection sampling to finetune the SFT policy.
24
+ - We show that DPO log-ratios can serve as a reliable reward signal, showing clear correlation between reward improvements and MT-Bench improvements.
25
 
26
  The official **Merlinite-7B-pt** achieves **7.96** on MT-Bench, surpassing Mistral-7B-Instruct-v0.1, Llama2-70b-chat and comparable to small-sized proprietary models like GPT3.5-Turbo-0314 and Claude-v1. It also exhibits superior instruction-following and human preference compared to the SFT Merlinite-7B model.
27
 
 
40
  | [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | SFT/DPO | Mistral-7B-v0.1 | GPT-4 | 7.34 | 61.07 | 63.74 | 84.19 | 78.06 | 34.04 |
41
  | [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) | SFT | Mistral-7B-v0.1 | - | 7.6** | 60.78 | 63.14 | 84.88 | 77.19 | 40.03 |
42
  | Merlinite-7b | Large-scale Alignment for chatBots (LAB) | Mistral-7B-v0.1 | Mixtral-8x7B-Instruct | 7.66 | 64.88 | 63.99 | 84.37 | 78.24 | 44.58 |
43
+ | Merlinite-7b-pt | LAB + RLAIF | Mistral-7B-v0.1 | Mixtral-8x7B-Instruct | 7.96 *** | 63.59 | 64.50 | 84.28 | 79.72 | 48.67 |
44
 
45
 
46
  [*] Numbers for models other than Merlinite-7b, Merlinite-7b-pt and [Labradorite-13b](https://huggingface.co/ibm/labradorite-13b) (ours) are taken from [lmsys/chatbot-arena-leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)
47
 
48
  [**] Numbers taken from [MistralAI Release Blog](https://mistral.ai/news/la-plateforme/)
49
 
50
+ [***] Merlinite-7b-pt model exhibits variability on the MT-Bench evaluation. The 5-run average score is 7.85, with highest 7.96 and lowest score 7.80.
51
 
52
  ### Method
53