BAAI
/

Infinity-Instruct-3M-0613-Mistral-7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hyxmmm commited on Jun 21, 2024

Commit

b1f98a0

·

verified ·

1 Parent(s): dfbca9e

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -101,6 +101,7 @@ Thanks to [FlagScale](https://github.com/FlagOpen/FlagScale), we could concatena
 </tbody></table>
 *denote the model is finetuned without reinforcement learning from human feedback (RLHF).
 We evaluate Infinity-Instruct-3M-0613-Mistral-7B on the two most popular instructions following benchmarks. Mt-Bench is a set of challenging multi-turn questions including code, math and routine dialogue. AlpacaEval2.0 is based on AlpacaFarm evaluation set. Both of these two benchmarks use GPT-4 to judge the model answer. AlpacaEval2.0 displays a high agreement rate with human-annotated benchmark, Chatbot Arena. The result shows that InfInstruct-3M-0613-Mistral-7B achieved 25.5 in AlpacaEval2.0, which is higher than the 22.5 of GPT3.5 Turbo although it does not yet use RLHF. InfInstruct-3M-0613-Mistral-7B also achieves 8.1 in MT-Bench, which is comparable to the state-of-the-art billion-parameter LLM such as Llama-3-8B-Instruct and Mistral-7B-Instruct-v0.2.
 ## Performance on **Downstream tasks**

 </tbody></table>
 *denote the model is finetuned without reinforcement learning from human feedback (RLHF).
 We evaluate Infinity-Instruct-3M-0613-Mistral-7B on the two most popular instructions following benchmarks. Mt-Bench is a set of challenging multi-turn questions including code, math and routine dialogue. AlpacaEval2.0 is based on AlpacaFarm evaluation set. Both of these two benchmarks use GPT-4 to judge the model answer. AlpacaEval2.0 displays a high agreement rate with human-annotated benchmark, Chatbot Arena. The result shows that InfInstruct-3M-0613-Mistral-7B achieved 25.5 in AlpacaEval2.0, which is higher than the 22.5 of GPT3.5 Turbo although it does not yet use RLHF. InfInstruct-3M-0613-Mistral-7B also achieves 8.1 in MT-Bench, which is comparable to the state-of-the-art billion-parameter LLM such as Llama-3-8B-Instruct and Mistral-7B-Instruct-v0.2.
 ## Performance on **Downstream tasks**