gx-ai-architect
/

merlinite-placeholder

@@ -18,9 +18,10 @@ base_model: mistralai/Mistral-7B-v0.1
 # Model Card for Merlinite-7B-pt 🔥
 ### Overview
-We introduce **Merlinite-7B-pt**, a strong open-source chat model, aligned using AI feedback **without using any human annotation or proprietary model**.
-**Merlinite-7B-pt** is first supervised-finetuned (SFT) via [LAB](https://arxiv.org/abs/2403.01081) using Mistral-7B-v0.1 as base model, and then preference-tuned via AI feedback. Our preference tuning recipe uses the DPO reward from Mixtral-8x7B-Instruct-v0.1 as the proxy for human preferences, and applies iterative rejection sampling to finetune the SFT policy. We show that DPO log-ratios can serve as a reliable reward signal, showing clear correlation between reward improvements and Mt-Bench improvements.
 The official **Merlinite-7B-pt** achieves **7.96** on MT-Bench, surpassing Mistral-7B-Instruct-v0.1, Llama2-70b-chat and comparable to small-sized proprietary models like GPT3.5-Turbo-0314 and Claude-v1. It also exhibits superior instruction-following and human preference compared to the SFT Merlinite-7B model.
@@ -39,14 +40,14 @@ The official **Merlinite-7B-pt** achieves **7.96** on MT-Bench, surpassing Mistr
 | [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | SFT/DPO | Mistral-7B-v0.1 | GPT-4 | 7.34 | 61.07 | 63.74 | 84.19 | 78.06 | 34.04 |
 | [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) | SFT | Mistral-7B-v0.1 | - | 7.6** | 60.78 | 63.14  | 84.88 | 77.19 | 40.03 |
 | Merlinite-7b | Large-scale Alignment for chatBots (LAB) | Mistral-7B-v0.1 | Mixtral-8x7B-Instruct | 7.66 | 64.88 | 63.99 | 84.37 | 78.24 | 44.58 |
-| Merlinite-7b-pt | LAB + RLAIF | Mistral-7B-v0.1 | Mixtral-8x7B-Instruct | 7.96 | 63.59 | 64.50 | 84.28 | 79.72 | 48.67 |
 [*] Numbers for models other than Merlinite-7b, Merlinite-7b-pt  and [Labradorite-13b](https://huggingface.co/ibm/labradorite-13b) (ours) are taken from [lmsys/chatbot-arena-leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)
 [**] Numbers taken from [MistralAI Release Blog](https://mistral.ai/news/la-plateforme/)
-[**] Merlinite-7b-pt model exhibits variability on the MT-Bench evaluation. The 5-run average score is 7.85, with highest 7.96 and lowest score 7.80.
 ### Method

 # Model Card for Merlinite-7B-pt 🔥
 ### Overview
+We introduce **Merlinite-7B-pt**, a strong open-source chat model, aligned using AI feedback **without proprietary models or using any human annotation**.
+- **Merlinite-7B-pt** is first supervised-finetuned (SFT) via [LAB](https://arxiv.org/abs/2403.01081) using Mistral-7B-v0.1 as base model, and then preference-tuned via AI feedback.
+- Our preference tuning recipe uses the DPO reward from Mixtral-8x7B-Instruct-v0.1 as the proxy for human preferences, and applies iterative rejection sampling to finetune the SFT policy.
+- We show that DPO log-ratios can serve as a reliable reward signal, showing clear correlation between reward improvements and MT-Bench improvements.
 The official **Merlinite-7B-pt** achieves **7.96** on MT-Bench, surpassing Mistral-7B-Instruct-v0.1, Llama2-70b-chat and comparable to small-sized proprietary models like GPT3.5-Turbo-0314 and Claude-v1. It also exhibits superior instruction-following and human preference compared to the SFT Merlinite-7B model.
 | [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | SFT/DPO | Mistral-7B-v0.1 | GPT-4 | 7.34 | 61.07 | 63.74 | 84.19 | 78.06 | 34.04 |
 | [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) | SFT | Mistral-7B-v0.1 | - | 7.6** | 60.78 | 63.14  | 84.88 | 77.19 | 40.03 |
 | Merlinite-7b | Large-scale Alignment for chatBots (LAB) | Mistral-7B-v0.1 | Mixtral-8x7B-Instruct | 7.66 | 64.88 | 63.99 | 84.37 | 78.24 | 44.58 |
+| Merlinite-7b-pt | LAB + RLAIF | Mistral-7B-v0.1 | Mixtral-8x7B-Instruct | 7.96 *** | 63.59 | 64.50 | 84.28 | 79.72 | 48.67 |
 [*] Numbers for models other than Merlinite-7b, Merlinite-7b-pt  and [Labradorite-13b](https://huggingface.co/ibm/labradorite-13b) (ours) are taken from [lmsys/chatbot-arena-leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)
 [**] Numbers taken from [MistralAI Release Blog](https://mistral.ai/news/la-plateforme/)
+[***] Merlinite-7b-pt model exhibits variability on the MT-Bench evaluation. The 5-run average score is 7.85, with highest 7.96 and lowest score 7.80.
 ### Method