leaderboard-pt-pr-bot commited on
Commit
2bb1694
1 Parent(s): 1bdd0bd

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +162 -0
README.md CHANGED
@@ -198,6 +198,150 @@ model-index:
198
  source:
199
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
200
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
201
  ---
202
 
203
  # ConfigurableBeagle-11B
@@ -255,3 +399,21 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
255
  |MuSR (0-shot) | 7.38|
256
  |MMLU-PRO (5-shot) |26.38|
257
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
198
  source:
199
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
200
  name: Open LLM Leaderboard
201
+ - task:
202
+ type: text-generation
203
+ name: Text Generation
204
+ dataset:
205
+ name: ENEM Challenge (No Images)
206
+ type: eduagarcia/enem_challenge
207
+ split: train
208
+ args:
209
+ num_few_shot: 3
210
+ metrics:
211
+ - type: acc
212
+ value: 68.79
213
+ name: accuracy
214
+ source:
215
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
216
+ name: Open Portuguese LLM Leaderboard
217
+ - task:
218
+ type: text-generation
219
+ name: Text Generation
220
+ dataset:
221
+ name: BLUEX (No Images)
222
+ type: eduagarcia-temp/BLUEX_without_images
223
+ split: train
224
+ args:
225
+ num_few_shot: 3
226
+ metrics:
227
+ - type: acc
228
+ value: 58.97
229
+ name: accuracy
230
+ source:
231
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
232
+ name: Open Portuguese LLM Leaderboard
233
+ - task:
234
+ type: text-generation
235
+ name: Text Generation
236
+ dataset:
237
+ name: OAB Exams
238
+ type: eduagarcia/oab_exams
239
+ split: train
240
+ args:
241
+ num_few_shot: 3
242
+ metrics:
243
+ - type: acc
244
+ value: 47.56
245
+ name: accuracy
246
+ source:
247
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
248
+ name: Open Portuguese LLM Leaderboard
249
+ - task:
250
+ type: text-generation
251
+ name: Text Generation
252
+ dataset:
253
+ name: Assin2 RTE
254
+ type: assin2
255
+ split: test
256
+ args:
257
+ num_few_shot: 15
258
+ metrics:
259
+ - type: f1_macro
260
+ value: 92.77
261
+ name: f1-macro
262
+ source:
263
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
264
+ name: Open Portuguese LLM Leaderboard
265
+ - task:
266
+ type: text-generation
267
+ name: Text Generation
268
+ dataset:
269
+ name: Assin2 STS
270
+ type: eduagarcia/portuguese_benchmark
271
+ split: test
272
+ args:
273
+ num_few_shot: 15
274
+ metrics:
275
+ - type: pearson
276
+ value: 81.58
277
+ name: pearson
278
+ source:
279
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
280
+ name: Open Portuguese LLM Leaderboard
281
+ - task:
282
+ type: text-generation
283
+ name: Text Generation
284
+ dataset:
285
+ name: FaQuAD NLI
286
+ type: ruanchaves/faquad-nli
287
+ split: test
288
+ args:
289
+ num_few_shot: 15
290
+ metrics:
291
+ - type: f1_macro
292
+ value: 77.82
293
+ name: f1-macro
294
+ source:
295
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
296
+ name: Open Portuguese LLM Leaderboard
297
+ - task:
298
+ type: text-generation
299
+ name: Text Generation
300
+ dataset:
301
+ name: HateBR Binary
302
+ type: ruanchaves/hatebr
303
+ split: test
304
+ args:
305
+ num_few_shot: 25
306
+ metrics:
307
+ - type: f1_macro
308
+ value: 82.91
309
+ name: f1-macro
310
+ source:
311
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
312
+ name: Open Portuguese LLM Leaderboard
313
+ - task:
314
+ type: text-generation
315
+ name: Text Generation
316
+ dataset:
317
+ name: PT Hate Speech Binary
318
+ type: hate_speech_portuguese
319
+ split: test
320
+ args:
321
+ num_few_shot: 25
322
+ metrics:
323
+ - type: f1_macro
324
+ value: 71.7
325
+ name: f1-macro
326
+ source:
327
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
328
+ name: Open Portuguese LLM Leaderboard
329
+ - task:
330
+ type: text-generation
331
+ name: Text Generation
332
+ dataset:
333
+ name: tweetSentBR
334
+ type: eduagarcia/tweetsentbr_fewshot
335
+ split: test
336
+ args:
337
+ num_few_shot: 25
338
+ metrics:
339
+ - type: f1_macro
340
+ value: 70.75
341
+ name: f1-macro
342
+ source:
343
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
344
+ name: Open Portuguese LLM Leaderboard
345
  ---
346
 
347
  # ConfigurableBeagle-11B
 
399
  |MuSR (0-shot) | 7.38|
400
  |MMLU-PRO (5-shot) |26.38|
401
 
402
+
403
+ # Open Portuguese LLM Leaderboard Evaluation Results
404
+
405
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/vicgalle/ConfigurableBeagle-11B) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
406
+
407
+ | Metric | Value |
408
+ |--------------------------|---------|
409
+ |Average |**72.54**|
410
+ |ENEM Challenge (No Images)| 68.79|
411
+ |BLUEX (No Images) | 58.97|
412
+ |OAB Exams | 47.56|
413
+ |Assin2 RTE | 92.77|
414
+ |Assin2 STS | 81.58|
415
+ |FaQuAD NLI | 77.82|
416
+ |HateBR Binary | 82.91|
417
+ |PT Hate Speech Binary | 71.70|
418
+ |tweetSentBR | 70.75|
419
+