Adding the Open Portuguese LLM Leaderboard Evaluation Results

#1
Files changed (1) hide show
  1. README.md +166 -0
README.md CHANGED
@@ -30,6 +30,153 @@ co2_eq_emissions:
30
  training_type: pre-training
31
  geographical_location: Germany
32
  hardware_used: NVIDIA A40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ---
34
  # Mula-8x160-v0.1
35
 
@@ -161,3 +308,22 @@ Mula-8x160-v0.1 is licensed under the Apache License, Version 2.0. See the [LICE
161
  ## Acknowledgements
162
 
163
  The authors gratefully acknowledge the granted access to the [Marvin cluster](https://www.hpc.uni-bonn.de/en/systems/marvin) hosted by the [University of Bonn](https://www.uni-bonn.de/en) along with the support provided by its High Performance Computing & Analytics Lab.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  training_type: pre-training
31
  geographical_location: Germany
32
  hardware_used: NVIDIA A40
33
+ model-index:
34
+ - name: Mula-8x160-v0.1
35
+ results:
36
+ - task:
37
+ type: text-generation
38
+ name: Text Generation
39
+ dataset:
40
+ name: ENEM Challenge (No Images)
41
+ type: eduagarcia/enem_challenge
42
+ split: train
43
+ args:
44
+ num_few_shot: 3
45
+ metrics:
46
+ - type: acc
47
+ value: 20.5
48
+ name: accuracy
49
+ source:
50
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MulaBR/Mula-8x160-v0.1
51
+ name: Open Portuguese LLM Leaderboard
52
+ - task:
53
+ type: text-generation
54
+ name: Text Generation
55
+ dataset:
56
+ name: BLUEX (No Images)
57
+ type: eduagarcia-temp/BLUEX_without_images
58
+ split: train
59
+ args:
60
+ num_few_shot: 3
61
+ metrics:
62
+ - type: acc
63
+ value: 21.28
64
+ name: accuracy
65
+ source:
66
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MulaBR/Mula-8x160-v0.1
67
+ name: Open Portuguese LLM Leaderboard
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: OAB Exams
73
+ type: eduagarcia/oab_exams
74
+ split: train
75
+ args:
76
+ num_few_shot: 3
77
+ metrics:
78
+ - type: acc
79
+ value: 26.65
80
+ name: accuracy
81
+ source:
82
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MulaBR/Mula-8x160-v0.1
83
+ name: Open Portuguese LLM Leaderboard
84
+ - task:
85
+ type: text-generation
86
+ name: Text Generation
87
+ dataset:
88
+ name: Assin2 RTE
89
+ type: assin2
90
+ split: test
91
+ args:
92
+ num_few_shot: 15
93
+ metrics:
94
+ - type: f1_macro
95
+ value: 22.38
96
+ name: f1-macro
97
+ source:
98
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MulaBR/Mula-8x160-v0.1
99
+ name: Open Portuguese LLM Leaderboard
100
+ - task:
101
+ type: text-generation
102
+ name: Text Generation
103
+ dataset:
104
+ name: Assin2 STS
105
+ type: eduagarcia/portuguese_benchmark
106
+ split: test
107
+ args:
108
+ num_few_shot: 15
109
+ metrics:
110
+ - type: pearson
111
+ value: 4.73
112
+ name: pearson
113
+ source:
114
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MulaBR/Mula-8x160-v0.1
115
+ name: Open Portuguese LLM Leaderboard
116
+ - task:
117
+ type: text-generation
118
+ name: Text Generation
119
+ dataset:
120
+ name: FaQuAD NLI
121
+ type: ruanchaves/faquad-nli
122
+ split: test
123
+ args:
124
+ num_few_shot: 15
125
+ metrics:
126
+ - type: f1_macro
127
+ value: 43.97
128
+ name: f1-macro
129
+ source:
130
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MulaBR/Mula-8x160-v0.1
131
+ name: Open Portuguese LLM Leaderboard
132
+ - task:
133
+ type: text-generation
134
+ name: Text Generation
135
+ dataset:
136
+ name: HateBR Binary
137
+ type: ruanchaves/hatebr
138
+ split: test
139
+ args:
140
+ num_few_shot: 25
141
+ metrics:
142
+ - type: f1_macro
143
+ value: 33.33
144
+ name: f1-macro
145
+ source:
146
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MulaBR/Mula-8x160-v0.1
147
+ name: Open Portuguese LLM Leaderboard
148
+ - task:
149
+ type: text-generation
150
+ name: Text Generation
151
+ dataset:
152
+ name: PT Hate Speech Binary
153
+ type: hate_speech_portuguese
154
+ split: test
155
+ args:
156
+ num_few_shot: 25
157
+ metrics:
158
+ - type: f1_macro
159
+ value: 40.21
160
+ name: f1-macro
161
+ source:
162
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MulaBR/Mula-8x160-v0.1
163
+ name: Open Portuguese LLM Leaderboard
164
+ - task:
165
+ type: text-generation
166
+ name: Text Generation
167
+ dataset:
168
+ name: tweetSentBR
169
+ type: eduagarcia/tweetsentbr_fewshot
170
+ split: test
171
+ args:
172
+ num_few_shot: 25
173
+ metrics:
174
+ - type: f1_macro
175
+ value: 18.46
176
+ name: f1-macro
177
+ source:
178
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MulaBR/Mula-8x160-v0.1
179
+ name: Open Portuguese LLM Leaderboard
180
  ---
181
  # Mula-8x160-v0.1
182
 
 
308
  ## Acknowledgements
309
 
310
  The authors gratefully acknowledge the granted access to the [Marvin cluster](https://www.hpc.uni-bonn.de/en/systems/marvin) hosted by the [University of Bonn](https://www.uni-bonn.de/en) along with the support provided by its High Performance Computing & Analytics Lab.
311
+
312
+
313
+ # Open Portuguese LLM Leaderboard Evaluation Results
314
+
315
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/MulaBR/Mula-8x160-v0.1) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
316
+
317
+ | Metric | Value |
318
+ |--------------------------|---------|
319
+ |Average |**25.72**|
320
+ |ENEM Challenge (No Images)| 20.50|
321
+ |BLUEX (No Images) | 21.28|
322
+ |OAB Exams | 26.65|
323
+ |Assin2 RTE | 22.38|
324
+ |Assin2 STS | 4.73|
325
+ |FaQuAD NLI | 43.97|
326
+ |HateBR Binary | 33.33|
327
+ |PT Hate Speech Binary | 40.21|
328
+ |tweetSentBR | 18.46|
329
+