T145 commited on
Commit
e71c904
1 Parent(s): a419033

Adding Evaluation Results

Browse files

This is an automated PR created with [this space](https://huggingface.co/spaces/T145/open-llm-leaderboard-results-to-modelcard)!

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

Please report any issues here: https://huggingface.co/spaces/T145/open-llm-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +114 -0
README.md CHANGED
@@ -2,6 +2,105 @@
2
  license: other
3
  base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
4
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
  <div align="center">
@@ -413,3 +512,18 @@ If you find our work helpful, please feel free to cite us using the following Bi
413
  url={https://huggingface.co/Skywork},
414
  }
415
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: other
3
  base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
4
  pipeline_tag: text-generation
5
+ model-index:
6
+ - name: Skywork-o1-Open-Llama-3.1-8B
7
+ results:
8
+ - task:
9
+ type: text-generation
10
+ name: Text Generation
11
+ dataset:
12
+ name: IFEval (0-Shot)
13
+ type: wis-k/instruction-following-eval
14
+ split: train
15
+ args:
16
+ num_few_shot: 0
17
+ metrics:
18
+ - type: inst_level_strict_acc and prompt_level_strict_acc
19
+ value: 35.18
20
+ name: averaged accuracy
21
+ source:
22
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Skywork%2FSkywork-o1-Open-Llama-3.1-8B
23
+ name: Open LLM Leaderboard
24
+ - task:
25
+ type: text-generation
26
+ name: Text Generation
27
+ dataset:
28
+ name: BBH (3-Shot)
29
+ type: SaylorTwift/bbh
30
+ split: test
31
+ args:
32
+ num_few_shot: 3
33
+ metrics:
34
+ - type: acc_norm
35
+ value: 23.02
36
+ name: normalized accuracy
37
+ source:
38
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Skywork%2FSkywork-o1-Open-Llama-3.1-8B
39
+ name: Open LLM Leaderboard
40
+ - task:
41
+ type: text-generation
42
+ name: Text Generation
43
+ dataset:
44
+ name: MATH Lvl 5 (4-Shot)
45
+ type: lighteval/MATH-Hard
46
+ split: test
47
+ args:
48
+ num_few_shot: 4
49
+ metrics:
50
+ - type: exact_match
51
+ value: 0.0
52
+ name: exact match
53
+ source:
54
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Skywork%2FSkywork-o1-Open-Llama-3.1-8B
55
+ name: Open LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: GPQA (0-shot)
61
+ type: Idavidrein/gpqa
62
+ split: train
63
+ args:
64
+ num_few_shot: 0
65
+ metrics:
66
+ - type: acc_norm
67
+ value: 1.23
68
+ name: acc_norm
69
+ source:
70
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Skywork%2FSkywork-o1-Open-Llama-3.1-8B
71
+ name: Open LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: MuSR (0-shot)
77
+ type: TAUR-Lab/MuSR
78
+ args:
79
+ num_few_shot: 0
80
+ metrics:
81
+ - type: acc_norm
82
+ value: 1.52
83
+ name: acc_norm
84
+ source:
85
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Skywork%2FSkywork-o1-Open-Llama-3.1-8B
86
+ name: Open LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: MMLU-PRO (5-shot)
92
+ type: TIGER-Lab/MMLU-Pro
93
+ config: main
94
+ split: test
95
+ args:
96
+ num_few_shot: 5
97
+ metrics:
98
+ - type: acc
99
+ value: 11.45
100
+ name: accuracy
101
+ source:
102
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Skywork%2FSkywork-o1-Open-Llama-3.1-8B
103
+ name: Open LLM Leaderboard
104
  ---
105
 
106
  <div align="center">
 
512
  url={https://huggingface.co/Skywork},
513
  }
514
  ```
515
+
516
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
517
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Skywork__Skywork-o1-Open-Llama-3.1-8B-details)!
518
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Skywork%2FSkywork-o1-Open-Llama-3.1-8B&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
519
+
520
+ | Metric |Value (%)|
521
+ |-------------------|--------:|
522
+ |**Average** | 12.07|
523
+ |IFEval (0-Shot) | 35.18|
524
+ |BBH (3-Shot) | 23.02|
525
+ |MATH Lvl 5 (4-Shot)| 0.00|
526
+ |GPQA (0-shot) | 1.23|
527
+ |MuSR (0-shot) | 1.52|
528
+ |MMLU-PRO (5-shot) | 11.45|
529
+