Adding Evaluation Results

#2
by T145 - opened
Files changed (1) hide show
  1. README.md +114 -1
README.md CHANGED
@@ -21,6 +21,105 @@ tags:
21
  - medical
22
  - healthcare
23
  pipeline_tag: question-answering
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ---
25
  <p align="center">
26
  <picture>
@@ -382,4 +481,18 @@ If you use this repository in a published work, please cite the corresponding pa
382
  archivePrefix={arXiv},
383
  primaryClass={cs.CL}
384
  }
385
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  - medical
22
  - healthcare
23
  pipeline_tag: question-answering
24
+ model-index:
25
+ - name: Llama3.1-Aloe-Beta-8B
26
+ results:
27
+ - task:
28
+ type: text-generation
29
+ name: Text Generation
30
+ dataset:
31
+ name: IFEval (0-Shot)
32
+ type: wis-k/instruction-following-eval
33
+ split: train
34
+ args:
35
+ num_few_shot: 0
36
+ metrics:
37
+ - type: inst_level_strict_acc and prompt_level_strict_acc
38
+ value: 72.53
39
+ name: averaged accuracy
40
+ source:
41
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=HPAI-BSC%2FLlama3.1-Aloe-Beta-8B
42
+ name: Open LLM Leaderboard
43
+ - task:
44
+ type: text-generation
45
+ name: Text Generation
46
+ dataset:
47
+ name: BBH (3-Shot)
48
+ type: SaylorTwift/bbh
49
+ split: test
50
+ args:
51
+ num_few_shot: 3
52
+ metrics:
53
+ - type: acc_norm
54
+ value: 30.37
55
+ name: normalized accuracy
56
+ source:
57
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=HPAI-BSC%2FLlama3.1-Aloe-Beta-8B
58
+ name: Open LLM Leaderboard
59
+ - task:
60
+ type: text-generation
61
+ name: Text Generation
62
+ dataset:
63
+ name: MATH Lvl 5 (4-Shot)
64
+ type: lighteval/MATH-Hard
65
+ split: test
66
+ args:
67
+ num_few_shot: 4
68
+ metrics:
69
+ - type: exact_match
70
+ value: 1.66
71
+ name: exact match
72
+ source:
73
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=HPAI-BSC%2FLlama3.1-Aloe-Beta-8B
74
+ name: Open LLM Leaderboard
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: GPQA (0-shot)
80
+ type: Idavidrein/gpqa
81
+ split: train
82
+ args:
83
+ num_few_shot: 0
84
+ metrics:
85
+ - type: acc_norm
86
+ value: 2.46
87
+ name: acc_norm
88
+ source:
89
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=HPAI-BSC%2FLlama3.1-Aloe-Beta-8B
90
+ name: Open LLM Leaderboard
91
+ - task:
92
+ type: text-generation
93
+ name: Text Generation
94
+ dataset:
95
+ name: MuSR (0-shot)
96
+ type: TAUR-Lab/MuSR
97
+ args:
98
+ num_few_shot: 0
99
+ metrics:
100
+ - type: acc_norm
101
+ value: 6.83
102
+ name: acc_norm
103
+ source:
104
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=HPAI-BSC%2FLlama3.1-Aloe-Beta-8B
105
+ name: Open LLM Leaderboard
106
+ - task:
107
+ type: text-generation
108
+ name: Text Generation
109
+ dataset:
110
+ name: MMLU-PRO (5-shot)
111
+ type: TIGER-Lab/MMLU-Pro
112
+ config: main
113
+ split: test
114
+ args:
115
+ num_few_shot: 5
116
+ metrics:
117
+ - type: acc
118
+ value: 28.67
119
+ name: accuracy
120
+ source:
121
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=HPAI-BSC%2FLlama3.1-Aloe-Beta-8B
122
+ name: Open LLM Leaderboard
123
  ---
124
  <p align="center">
125
  <picture>
 
481
  archivePrefix={arXiv},
482
  primaryClass={cs.CL}
483
  }
484
+ ```
485
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
486
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/HPAI-BSC__Llama3.1-Aloe-Beta-8B-details)!
487
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=HPAI-BSC%2FLlama3.1-Aloe-Beta-8B&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
488
+
489
+ | Metric |Value (%)|
490
+ |-------------------|--------:|
491
+ |**Average** | 23.75|
492
+ |IFEval (0-Shot) | 72.53|
493
+ |BBH (3-Shot) | 30.37|
494
+ |MATH Lvl 5 (4-Shot)| 1.66|
495
+ |GPQA (0-shot) | 2.46|
496
+ |MuSR (0-shot) | 6.83|
497
+ |MMLU-PRO (5-shot) | 28.67|
498
+