T145 commited on
Commit
33dca2e
1 Parent(s): abebc90

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -121
README.md CHANGED
@@ -11,104 +11,6 @@ base_model:
11
  model-index:
12
  - name: ZEUS-8B-V10
13
  results:
14
- - task:
15
- type: text-generation
16
- name: Text Generation
17
- dataset:
18
- name: IFEval (0-Shot)
19
- type: HuggingFaceH4/ifeval
20
- args:
21
- num_few_shot: 0
22
- metrics:
23
- - type: inst_level_strict_acc and prompt_level_strict_acc
24
- value: 77.07
25
- name: strict accuracy
26
- source:
27
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=T145/ZEUS-8B-V10
28
- name: Open LLM Leaderboard
29
- - task:
30
- type: text-generation
31
- name: Text Generation
32
- dataset:
33
- name: BBH (3-Shot)
34
- type: BBH
35
- args:
36
- num_few_shot: 3
37
- metrics:
38
- - type: acc_norm
39
- value: 32.7
40
- name: normalized accuracy
41
- source:
42
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=T145/ZEUS-8B-V10
43
- name: Open LLM Leaderboard
44
- - task:
45
- type: text-generation
46
- name: Text Generation
47
- dataset:
48
- name: MATH Lvl 5 (4-Shot)
49
- type: hendrycks/competition_math
50
- args:
51
- num_few_shot: 4
52
- metrics:
53
- - type: exact_match
54
- value: 20.09
55
- name: exact match
56
- source:
57
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=T145/ZEUS-8B-V10
58
- name: Open LLM Leaderboard
59
- - task:
60
- type: text-generation
61
- name: Text Generation
62
- dataset:
63
- name: GPQA (0-shot)
64
- type: Idavidrein/gpqa
65
- args:
66
- num_few_shot: 0
67
- metrics:
68
- - type: acc_norm
69
- value: 9.96
70
- name: acc_norm
71
- source:
72
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=T145/ZEUS-8B-V10
73
- name: Open LLM Leaderboard
74
- - task:
75
- type: text-generation
76
- name: Text Generation
77
- dataset:
78
- name: MuSR (0-shot)
79
- type: TAUR-Lab/MuSR
80
- args:
81
- num_few_shot: 0
82
- metrics:
83
- - type: acc_norm
84
- value: 9.09
85
- name: acc_norm
86
- - type: acc_norm
87
- value: 9.09
88
- name: acc_norm
89
- source:
90
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=T145/ZEUS-8B-V10
91
- name: Open LLM Leaderboard
92
- - task:
93
- type: text-generation
94
- name: Text Generation
95
- dataset:
96
- name: MMLU-PRO (5-shot)
97
- type: TIGER-Lab/MMLU-Pro
98
- config: main
99
- split: test
100
- args:
101
- num_few_shot: 5
102
- metrics:
103
- - type: acc
104
- value: 32.26
105
- name: accuracy
106
- - type: acc
107
- value: 32.26
108
- name: accuracy
109
- source:
110
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=T145/ZEUS-8B-V10
111
- name: Open LLM Leaderboard
112
  - task:
113
  type: text-generation
114
  name: Text Generation
@@ -231,29 +133,6 @@ tokenizer_source: union
231
  ```
232
 
233
  # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
234
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/T145__ZEUS-8B-V10-details)
235
-
236
- | Metric |Value|
237
- |-------------------|----:|
238
- |Avg. |30.19|
239
- |IFEval (0-Shot) |77.07|
240
- |BBH (3-Shot) |32.70|
241
- |MATH Lvl 5 (4-Shot)|20.09|
242
- |GPQA (0-shot) | 9.96|
243
- |MuSR (0-shot) | 9.09|
244
- |MMLU-PRO (5-shot) |32.26|
245
-
246
- ## Changes over V2
247
-
248
- | Metric |Change|
249
- |-------------------|-----:|
250
- |Avg. |+0.12|
251
- |IFEval (0-Shot) |-3.22|
252
- |BBH (3-Shot) |+1.09|
253
- |MATH Lvl 5 (4-Shot)|-1.06|
254
- |GPQA (0-shot) |+3.02|
255
- |MuSR (0-shot) |+0.85|
256
- |MMLU-PRO (5-shot) |+0.08|# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
257
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/T145__ZEUS-8B-V10-details)!
258
  Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=T145/ZEUS-8B-V10)!
259
 
@@ -267,3 +146,15 @@ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-
267
  |MuSR (0-shot) | 9.09|
268
  |MMLU-PRO (5-shot) | 32.26|
269
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  model-index:
12
  - name: ZEUS-8B-V10
13
  results:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  - task:
15
  type: text-generation
16
  name: Text Generation
 
133
  ```
134
 
135
  # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
136
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/T145__ZEUS-8B-V10-details)!
137
  Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=T145/ZEUS-8B-V10)!
138
 
 
146
  |MuSR (0-shot) | 9.09|
147
  |MMLU-PRO (5-shot) | 32.26|
148
 
149
+ ## Changes over V2
150
+
151
+ | Metric |Change|
152
+ |-------------------|-----:|
153
+ |Avg. |+0.12|
154
+ |IFEval (0-Shot) |-3.22|
155
+ |BBH (3-Shot) |+1.09|
156
+ |MATH Lvl 5 (4-Shot)|-1.06|
157
+ |GPQA (0-shot) |+3.02|
158
+ |MuSR (0-shot) |+0.85|
159
+ |MMLU-PRO (5-shot) |+0.08|
160
+