jetmoe
/

jetmoe-8b

YikangS commited on Mar 26

Commit

b69acca

•

1 Parent(s): 6832b03

update readme

Files changed (1) hide show

README.md CHANGED Viewed

@@ -15,9 +15,11 @@ To our surprise, JetMoE-8B performs even better than LLaMA2-7B, LLaMA-13B, and D
 Compared to a model with similar training and inference computation, JetMoE-8B achieves significantly better performance compared to Gemma-2B.
 ## Evaluation Results
 |Model|Activate Params|Training Tokens|ARC-challenge|Hellaswag|MMLU|TruthfulQA|WinoGrande|GSM8k|Open LLM Leaderboard Average|MBPP|HumanEval|
 |---|---|---|---|---|---|---|---|---|---|---|---|
-||||acc|acc|acc|acc|acc|acc|acc|Pass@1|Pass@1|
 |LLaMA2-7B|7B|2T|53.1|78.6|46.9|38.8|**74**|14.5|51.0|20.8|12.8|
 |LLaMA-13B|13B|1T|**56.2**|**80.9**|47.7|39.5|**76.2**|7.6|51.4|22.0|15.8|
 |DeepseekMoE-16B|2.8B|2T|53.2|79.8|46.3|36.1|73.7|17.3|51.1|34.0|**25.0**|

 Compared to a model with similar training and inference computation, JetMoE-8B achieves significantly better performance compared to Gemma-2B.
 ## Evaluation Results
+For most benchmarks, we use the same evaluation methodology as in the Open LLM leaderboard. For code benchmarks, we use the same evaluation methodology as in the LLaMA2 and Deepseek MoE paper. The evaluation results are as follows:
 |Model|Activate Params|Training Tokens|ARC-challenge|Hellaswag|MMLU|TruthfulQA|WinoGrande|GSM8k|Open LLM Leaderboard Average|MBPP|HumanEval|
 |---|---|---|---|---|---|---|---|---|---|---|---|
+|shot|||25|10|5|0|5|5||3|0|
+||||acc_norm|acc_norm|acc|mc2|acc|acc||Pass@1|Pass@1|
 |LLaMA2-7B|7B|2T|53.1|78.6|46.9|38.8|**74**|14.5|51.0|20.8|12.8|
 |LLaMA-13B|13B|1T|**56.2**|**80.9**|47.7|39.5|**76.2**|7.6|51.4|22.0|15.8|
 |DeepseekMoE-16B|2.8B|2T|53.2|79.8|46.3|36.1|73.7|17.3|51.1|34.0|**25.0**|