YikangS commited on
Commit
fe991da
1 Parent(s): 89856fc

update readme

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -15,11 +15,11 @@ To our surprise, JetMoE-8B performs even better than LLaMA2-7B, LLaMA-13B, and D
15
  Compared to a model with similar training and inference computation, like Gemma-2B, JetMoE-8B achieves significantly better performance.
16
 
17
  ## Evaluation Results
18
- For most benchmarks, we use the same evaluation methodology as in the Open LLM leaderboard. For code benchmarks, we use the same evaluation methodology as in the LLaMA2 and Deepseek MoE paper. The evaluation results are as follows:
19
  |Model|Activate Params|Training Tokens|ARC-challenge|Hellaswag|MMLU|TruthfulQA|WinoGrande|GSM8k|Open LLM Leaderboard Average|MBPP|HumanEval|
20
  |---|---|---|---|---|---|---|---|---|---|---|---|
21
- |shot|||25|10|5|0|5|5||3|0|
22
- ||||acc_norm|acc_norm|acc|mc2|acc|acc||Pass@1|Pass@1|
23
  |LLaMA2-7B|7B|2T|53.1|78.6|46.9|38.8|74|14.5|51.0|20.8|12.8|
24
  |LLaMA-13B|13B|1T|**56.2**|**80.9**|47.7|39.5|**76.2**|7.6|51.4|22.0|15.8|
25
  |DeepseekMoE-16B|2.8B|2T|53.2|79.8|46.3|36.1|73.7|17.3|51.1|34.0|**25.0**|
 
15
  Compared to a model with similar training and inference computation, like Gemma-2B, JetMoE-8B achieves significantly better performance.
16
 
17
  ## Evaluation Results
18
+ For most benchmarks, we use the same evaluation methodology as in the [Open LLM leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). For code benchmarks, we use the same evaluation methodology as in the LLaMA2 and Deepseek MoE paper. The evaluation results are as follows:
19
  |Model|Activate Params|Training Tokens|ARC-challenge|Hellaswag|MMLU|TruthfulQA|WinoGrande|GSM8k|Open LLM Leaderboard Average|MBPP|HumanEval|
20
  |---|---|---|---|---|---|---|---|---|---|---|---|
21
+ |Shot|||25|10|5|0|5|5||3|0|
22
+ |Metric|||acc_norm|acc_norm|acc|mc2|acc|acc||Pass@1|Pass@1|
23
  |LLaMA2-7B|7B|2T|53.1|78.6|46.9|38.8|74|14.5|51.0|20.8|12.8|
24
  |LLaMA-13B|13B|1T|**56.2**|**80.9**|47.7|39.5|**76.2**|7.6|51.4|22.0|15.8|
25
  |DeepseekMoE-16B|2.8B|2T|53.2|79.8|46.3|36.1|73.7|17.3|51.1|34.0|**25.0**|