leaderboard-pt-pr-bot commited on
Commit
1cfdbbe
1 Parent(s): d632a1c

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +135 -0
README.md CHANGED
@@ -106,6 +106,124 @@ model-index:
106
  source:
107
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=22h/open-cabrita3b
108
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  ---
110
  The Cabrita model is a collection of continued pre-trained and tokenizer-adapted models for the Portuguese language.
111
  This artifact is the 3 billion size variant.
@@ -136,3 +254,20 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
136
  |Winogrande (5-shot) |59.43|
137
  |GSM8k (5-shot) | 0.99|
138
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106
  source:
107
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=22h/open-cabrita3b
108
  name: Open LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: ENEM Challenge (No Images)
114
+ type: eduagarcia/enem_challenge
115
+ split: train
116
+ args:
117
+ num_few_shot: 3
118
+ metrics:
119
+ - type: acc
120
+ value: 17.98
121
+ name: accuracy
122
+ source:
123
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
124
+ name: Open Portuguese LLM Leaderboard
125
+ - task:
126
+ type: text-generation
127
+ name: Text Generation
128
+ dataset:
129
+ name: BLUEX (No Images)
130
+ type: eduagarcia-temp/BLUEX_without_images
131
+ split: train
132
+ args:
133
+ num_few_shot: 3
134
+ metrics:
135
+ - type: acc
136
+ value: 21.14
137
+ name: accuracy
138
+ source:
139
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
140
+ name: Open Portuguese LLM Leaderboard
141
+ - task:
142
+ type: text-generation
143
+ name: Text Generation
144
+ dataset:
145
+ name: OAB Exams
146
+ type: eduagarcia/oab_exams
147
+ split: train
148
+ args:
149
+ num_few_shot: 3
150
+ metrics:
151
+ - type: acc
152
+ value: 22.69
153
+ name: accuracy
154
+ source:
155
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
156
+ name: Open Portuguese LLM Leaderboard
157
+ - task:
158
+ type: text-generation
159
+ name: Text Generation
160
+ dataset:
161
+ name: Assin2 RTE
162
+ type: assin2
163
+ split: test
164
+ args:
165
+ num_few_shot: 15
166
+ metrics:
167
+ - type: f1_macro
168
+ value: 43.01
169
+ name: f1-macro
170
+ - type: pearson
171
+ value: 8.92
172
+ name: pearson
173
+ source:
174
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
175
+ name: Open Portuguese LLM Leaderboard
176
+ - task:
177
+ type: text-generation
178
+ name: Text Generation
179
+ dataset:
180
+ name: FaQuAD NLI
181
+ type: ruanchaves/faquad-nli
182
+ split: test
183
+ args:
184
+ num_few_shot: 15
185
+ metrics:
186
+ - type: f1_macro
187
+ value: 43.97
188
+ name: f1-macro
189
+ source:
190
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
191
+ name: Open Portuguese LLM Leaderboard
192
+ - task:
193
+ type: text-generation
194
+ name: Text Generation
195
+ dataset:
196
+ name: HateBR Binary
197
+ type: eduagarcia/portuguese_benchmark
198
+ split: test
199
+ args:
200
+ num_few_shot: 25
201
+ metrics:
202
+ - type: f1_macro
203
+ value: 50.46
204
+ name: f1-macro
205
+ - type: f1_macro
206
+ value: 41.19
207
+ name: f1-macro
208
+ source:
209
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
210
+ name: Open Portuguese LLM Leaderboard
211
+ - task:
212
+ type: text-generation
213
+ name: Text Generation
214
+ dataset:
215
+ name: tweetSentBR
216
+ type: eduagarcia-temp/tweetsentbr
217
+ split: test
218
+ args:
219
+ num_few_shot: 25
220
+ metrics:
221
+ - type: f1_macro
222
+ value: 47.96
223
+ name: f1-macro
224
+ source:
225
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
226
+ name: Open Portuguese LLM Leaderboard
227
  ---
228
  The Cabrita model is a collection of continued pre-trained and tokenizer-adapted models for the Portuguese language.
229
  This artifact is the 3 billion size variant.
 
254
  |Winogrande (5-shot) |59.43|
255
  |GSM8k (5-shot) | 0.99|
256
 
257
+
258
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
259
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/22h/open-cabrita3b)
260
+
261
+ | Metric | Value |
262
+ |--------------------------|---------|
263
+ |Average |**33.04**|
264
+ |ENEM Challenge (No Images)| 17.98|
265
+ |BLUEX (No Images) | 21.14|
266
+ |OAB Exams | 22.69|
267
+ |Assin2 RTE | 43.01|
268
+ |Assin2 STS | 8.92|
269
+ |FaQuAD NLI | 43.97|
270
+ |HateBR Binary | 50.46|
271
+ |PT Hate Speech Binary | 41.19|
272
+ |tweetSentBR | 47.96|
273
+