vicgalle leaderboard-pr-bot commited on
Commit
e85b30c
1 Parent(s): 476cd2a

Adding Evaluation Results (#5)

Browse files

- Adding Evaluation Results (cf75a89f1bd27971f063f24bfd5c5f98bbceff04)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -106,6 +106,98 @@ model-index:
106
  source:
107
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B-truthy
108
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  ---
110
 
111
 
@@ -122,3 +214,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
122
  |Winogrande (5-shot) |83.82|
123
  |GSM8k (5-shot) |66.11|
124
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106
  source:
107
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B-truthy
108
  name: Open LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: IFEval (0-Shot)
114
+ type: HuggingFaceH4/ifeval
115
+ args:
116
+ num_few_shot: 0
117
+ metrics:
118
+ - type: inst_level_strict_acc and prompt_level_strict_acc
119
+ value: 52.12
120
+ name: strict accuracy
121
+ source:
122
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B-truthy
123
+ name: Open LLM Leaderboard
124
+ - task:
125
+ type: text-generation
126
+ name: Text Generation
127
+ dataset:
128
+ name: BBH (3-Shot)
129
+ type: BBH
130
+ args:
131
+ num_few_shot: 3
132
+ metrics:
133
+ - type: acc_norm
134
+ value: 33.99
135
+ name: normalized accuracy
136
+ source:
137
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B-truthy
138
+ name: Open LLM Leaderboard
139
+ - task:
140
+ type: text-generation
141
+ name: Text Generation
142
+ dataset:
143
+ name: MATH Lvl 5 (4-Shot)
144
+ type: hendrycks/competition_math
145
+ args:
146
+ num_few_shot: 4
147
+ metrics:
148
+ - type: exact_match
149
+ value: 4.76
150
+ name: exact match
151
+ source:
152
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B-truthy
153
+ name: Open LLM Leaderboard
154
+ - task:
155
+ type: text-generation
156
+ name: Text Generation
157
+ dataset:
158
+ name: GPQA (0-shot)
159
+ type: Idavidrein/gpqa
160
+ args:
161
+ num_few_shot: 0
162
+ metrics:
163
+ - type: acc_norm
164
+ value: 6.6
165
+ name: acc_norm
166
+ source:
167
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B-truthy
168
+ name: Open LLM Leaderboard
169
+ - task:
170
+ type: text-generation
171
+ name: Text Generation
172
+ dataset:
173
+ name: MuSR (0-shot)
174
+ type: TAUR-Lab/MuSR
175
+ args:
176
+ num_few_shot: 0
177
+ metrics:
178
+ - type: acc_norm
179
+ value: 4.11
180
+ name: acc_norm
181
+ source:
182
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B-truthy
183
+ name: Open LLM Leaderboard
184
+ - task:
185
+ type: text-generation
186
+ name: Text Generation
187
+ dataset:
188
+ name: MMLU-PRO (5-shot)
189
+ type: TIGER-Lab/MMLU-Pro
190
+ config: main
191
+ split: test
192
+ args:
193
+ num_few_shot: 5
194
+ metrics:
195
+ - type: acc
196
+ value: 26.19
197
+ name: accuracy
198
+ source:
199
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B-truthy
200
+ name: Open LLM Leaderboard
201
  ---
202
 
203
 
 
214
  |Winogrande (5-shot) |83.82|
215
  |GSM8k (5-shot) |66.11|
216
 
217
+
218
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
219
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_vicgalle__CarbonBeagle-11B-truthy)
220
+
221
+ | Metric |Value|
222
+ |-------------------|----:|
223
+ |Avg. |21.29|
224
+ |IFEval (0-Shot) |52.12|
225
+ |BBH (3-Shot) |33.99|
226
+ |MATH Lvl 5 (4-Shot)| 4.76|
227
+ |GPQA (0-shot) | 6.60|
228
+ |MuSR (0-shot) | 4.11|
229
+ |MMLU-PRO (5-shot) |26.19|
230
+