T145 commited on
Commit
13e77a9
·
verified ·
1 Parent(s): 24ae87a

Adding Evaluation Results

Browse files

This is an automated PR created with [this space](https://huggingface.co/spaces/T145/open-llm-leaderboard-results-to-modelcard)!

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

Please report any issues here: https://huggingface.co/spaces/T145/open-llm-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +114 -0
README.md CHANGED
@@ -1,6 +1,105 @@
1
  ---
2
  license: mit
3
  library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
  # DeepSeek-R1
6
  <!-- markdownlint-disable first-line-h1 -->
@@ -234,3 +333,18 @@ DeepSeek-R1 series support commercial use, allow for any modifications and deriv
234
 
235
  ## 9. Contact
236
  If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  library_name: transformers
4
+ model-index:
5
+ - name: DeepSeek-R1-Distill-Llama-8B
6
+ results:
7
+ - task:
8
+ type: text-generation
9
+ name: Text Generation
10
+ dataset:
11
+ name: IFEval (0-Shot)
12
+ type: wis-k/instruction-following-eval
13
+ split: train
14
+ args:
15
+ num_few_shot: 0
16
+ metrics:
17
+ - type: inst_level_strict_acc and prompt_level_strict_acc
18
+ value: 37.82
19
+ name: averaged accuracy
20
+ source:
21
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=deepseek-ai%2FDeepSeek-R1-Distill-Llama-8B
22
+ name: Open LLM Leaderboard
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: BBH (3-Shot)
28
+ type: SaylorTwift/bbh
29
+ split: test
30
+ args:
31
+ num_few_shot: 3
32
+ metrics:
33
+ - type: acc_norm
34
+ value: 5.33
35
+ name: normalized accuracy
36
+ source:
37
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=deepseek-ai%2FDeepSeek-R1-Distill-Llama-8B
38
+ name: Open LLM Leaderboard
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: MATH Lvl 5 (4-Shot)
44
+ type: lighteval/MATH-Hard
45
+ split: test
46
+ args:
47
+ num_few_shot: 4
48
+ metrics:
49
+ - type: exact_match
50
+ value: 0.0
51
+ name: exact match
52
+ source:
53
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=deepseek-ai%2FDeepSeek-R1-Distill-Llama-8B
54
+ name: Open LLM Leaderboard
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: GPQA (0-shot)
60
+ type: Idavidrein/gpqa
61
+ split: train
62
+ args:
63
+ num_few_shot: 0
64
+ metrics:
65
+ - type: acc_norm
66
+ value: 0.67
67
+ name: acc_norm
68
+ source:
69
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=deepseek-ai%2FDeepSeek-R1-Distill-Llama-8B
70
+ name: Open LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: MuSR (0-shot)
76
+ type: TAUR-Lab/MuSR
77
+ args:
78
+ num_few_shot: 0
79
+ metrics:
80
+ - type: acc_norm
81
+ value: 0.46
82
+ name: acc_norm
83
+ source:
84
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=deepseek-ai%2FDeepSeek-R1-Distill-Llama-8B
85
+ name: Open LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: MMLU-PRO (5-shot)
91
+ type: TIGER-Lab/MMLU-Pro
92
+ config: main
93
+ split: test
94
+ args:
95
+ num_few_shot: 5
96
+ metrics:
97
+ - type: acc
98
+ value: 12.1
99
+ name: accuracy
100
+ source:
101
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=deepseek-ai%2FDeepSeek-R1-Distill-Llama-8B
102
+ name: Open LLM Leaderboard
103
  ---
104
  # DeepSeek-R1
105
  <!-- markdownlint-disable first-line-h1 -->
 
333
 
334
  ## 9. Contact
335
  If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
336
+
337
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
338
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/deepseek-ai__DeepSeek-R1-Distill-Llama-8B-details)!
339
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=deepseek-ai%2FDeepSeek-R1-Distill-Llama-8B&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
340
+
341
+ | Metric |Value (%)|
342
+ |-------------------|--------:|
343
+ |**Average** | 9.40|
344
+ |IFEval (0-Shot) | 37.82|
345
+ |BBH (3-Shot) | 5.33|
346
+ |MATH Lvl 5 (4-Shot)| 0.00|
347
+ |GPQA (0-shot) | 0.67|
348
+ |MuSR (0-shot) | 0.46|
349
+ |MMLU-PRO (5-shot) | 12.10|
350
+