hyxmmm commited on
Commit
b1f98a0
·
verified ·
1 Parent(s): dfbca9e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -101,6 +101,7 @@ Thanks to [FlagScale](https://github.com/FlagOpen/FlagScale), we could concatena
101
  </tbody></table>
102
 
103
  *denote the model is finetuned without reinforcement learning from human feedback (RLHF).
 
104
  We evaluate Infinity-Instruct-3M-0613-Mistral-7B on the two most popular instructions following benchmarks. Mt-Bench is a set of challenging multi-turn questions including code, math and routine dialogue. AlpacaEval2.0 is based on AlpacaFarm evaluation set. Both of these two benchmarks use GPT-4 to judge the model answer. AlpacaEval2.0 displays a high agreement rate with human-annotated benchmark, Chatbot Arena. The result shows that InfInstruct-3M-0613-Mistral-7B achieved 25.5 in AlpacaEval2.0, which is higher than the 22.5 of GPT3.5 Turbo although it does not yet use RLHF. InfInstruct-3M-0613-Mistral-7B also achieves 8.1 in MT-Bench, which is comparable to the state-of-the-art billion-parameter LLM such as Llama-3-8B-Instruct and Mistral-7B-Instruct-v0.2.
105
 
106
  ## Performance on **Downstream tasks**
 
101
  </tbody></table>
102
 
103
  *denote the model is finetuned without reinforcement learning from human feedback (RLHF).
104
+
105
  We evaluate Infinity-Instruct-3M-0613-Mistral-7B on the two most popular instructions following benchmarks. Mt-Bench is a set of challenging multi-turn questions including code, math and routine dialogue. AlpacaEval2.0 is based on AlpacaFarm evaluation set. Both of these two benchmarks use GPT-4 to judge the model answer. AlpacaEval2.0 displays a high agreement rate with human-annotated benchmark, Chatbot Arena. The result shows that InfInstruct-3M-0613-Mistral-7B achieved 25.5 in AlpacaEval2.0, which is higher than the 22.5 of GPT3.5 Turbo although it does not yet use RLHF. InfInstruct-3M-0613-Mistral-7B also achieves 8.1 in MT-Bench, which is comparable to the state-of-the-art billion-parameter LLM such as Llama-3-8B-Instruct and Mistral-7B-Instruct-v0.2.
106
 
107
  ## Performance on **Downstream tasks**