update readme
Browse files
README.md
CHANGED
@@ -15,11 +15,11 @@ To our surprise, JetMoE-8B performs even better than LLaMA2-7B, LLaMA-13B, and D
|
|
15 |
Compared to a model with similar training and inference computation, like Gemma-2B, JetMoE-8B achieves significantly better performance.
|
16 |
|
17 |
## Evaluation Results
|
18 |
-
For most benchmarks, we use the same evaluation methodology as in the Open LLM leaderboard. For code benchmarks, we use the same evaluation methodology as in the LLaMA2 and Deepseek MoE paper. The evaluation results are as follows:
|
19 |
|Model|Activate Params|Training Tokens|ARC-challenge|Hellaswag|MMLU|TruthfulQA|WinoGrande|GSM8k|Open LLM Leaderboard Average|MBPP|HumanEval|
|
20 |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
21 |
-
|
|
22 |
-
|
23 |
|LLaMA2-7B|7B|2T|53.1|78.6|46.9|38.8|74|14.5|51.0|20.8|12.8|
|
24 |
|LLaMA-13B|13B|1T|**56.2**|**80.9**|47.7|39.5|**76.2**|7.6|51.4|22.0|15.8|
|
25 |
|DeepseekMoE-16B|2.8B|2T|53.2|79.8|46.3|36.1|73.7|17.3|51.1|34.0|**25.0**|
|
|
|
15 |
Compared to a model with similar training and inference computation, like Gemma-2B, JetMoE-8B achieves significantly better performance.
|
16 |
|
17 |
## Evaluation Results
|
18 |
+
For most benchmarks, we use the same evaluation methodology as in the [Open LLM leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). For code benchmarks, we use the same evaluation methodology as in the LLaMA2 and Deepseek MoE paper. The evaluation results are as follows:
|
19 |
|Model|Activate Params|Training Tokens|ARC-challenge|Hellaswag|MMLU|TruthfulQA|WinoGrande|GSM8k|Open LLM Leaderboard Average|MBPP|HumanEval|
|
20 |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
21 |
+
|Shot|||25|10|5|0|5|5||3|0|
|
22 |
+
|Metric|||acc_norm|acc_norm|acc|mc2|acc|acc||Pass@1|Pass@1|
|
23 |
|LLaMA2-7B|7B|2T|53.1|78.6|46.9|38.8|74|14.5|51.0|20.8|12.8|
|
24 |
|LLaMA-13B|13B|1T|**56.2**|**80.9**|47.7|39.5|**76.2**|7.6|51.4|22.0|15.8|
|
25 |
|DeepseekMoE-16B|2.8B|2T|53.2|79.8|46.3|36.1|73.7|17.3|51.1|34.0|**25.0**|
|