YikangS commited on
Commit
b69acca
1 Parent(s): 6832b03

update readme

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -15,9 +15,11 @@ To our surprise, JetMoE-8B performs even better than LLaMA2-7B, LLaMA-13B, and D
15
  Compared to a model with similar training and inference computation, JetMoE-8B achieves significantly better performance compared to Gemma-2B.
16
 
17
  ## Evaluation Results
 
18
  |Model|Activate Params|Training Tokens|ARC-challenge|Hellaswag|MMLU|TruthfulQA|WinoGrande|GSM8k|Open LLM Leaderboard Average|MBPP|HumanEval|
19
  |---|---|---|---|---|---|---|---|---|---|---|---|
20
- ||||acc|acc|acc|acc|acc|acc|acc|Pass@1|Pass@1|
 
21
  |LLaMA2-7B|7B|2T|53.1|78.6|46.9|38.8|**74**|14.5|51.0|20.8|12.8|
22
  |LLaMA-13B|13B|1T|**56.2**|**80.9**|47.7|39.5|**76.2**|7.6|51.4|22.0|15.8|
23
  |DeepseekMoE-16B|2.8B|2T|53.2|79.8|46.3|36.1|73.7|17.3|51.1|34.0|**25.0**|
 
15
  Compared to a model with similar training and inference computation, JetMoE-8B achieves significantly better performance compared to Gemma-2B.
16
 
17
  ## Evaluation Results
18
+ For most benchmarks, we use the same evaluation methodology as in the Open LLM leaderboard. For code benchmarks, we use the same evaluation methodology as in the LLaMA2 and Deepseek MoE paper. The evaluation results are as follows:
19
  |Model|Activate Params|Training Tokens|ARC-challenge|Hellaswag|MMLU|TruthfulQA|WinoGrande|GSM8k|Open LLM Leaderboard Average|MBPP|HumanEval|
20
  |---|---|---|---|---|---|---|---|---|---|---|---|
21
+ |shot|||25|10|5|0|5|5||3|0|
22
+ ||||acc_norm|acc_norm|acc|mc2|acc|acc||Pass@1|Pass@1|
23
  |LLaMA2-7B|7B|2T|53.1|78.6|46.9|38.8|**74**|14.5|51.0|20.8|12.8|
24
  |LLaMA-13B|13B|1T|**56.2**|**80.9**|47.7|39.5|**76.2**|7.6|51.4|22.0|15.8|
25
  |DeepseekMoE-16B|2.8B|2T|53.2|79.8|46.3|36.1|73.7|17.3|51.1|34.0|**25.0**|