Adding the Open Portuguese LLM Leaderboard Evaluation Results

#1
Files changed (1) hide show
  1. README.md +166 -3
README.md CHANGED
@@ -9,6 +9,8 @@ tags:
9
  - gemma
10
  - portugues
11
  - instrucao
 
 
12
  pipeline_tag: text-generation
13
  widget:
14
  - text: Me explique como funciona um computador.
@@ -19,8 +21,153 @@ widget:
19
  example_title: História.
20
  - text: Escreva um poema bem interessante sobre o Sol e as flores.
21
  example_title: Escreva um poema.
22
- datasets:
23
- - rhaymison/superset
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ---
25
 
26
  # gemma-portuguese-2b-luana
@@ -151,4 +298,20 @@ email: rhaymisoncristian@gmail.com
151
  <a href="https://github.com/rhaymisonbetini" target="_blank">
152
  <img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white">
153
  </a>
154
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - gemma
10
  - portugues
11
  - instrucao
12
+ datasets:
13
+ - rhaymison/superset
14
  pipeline_tag: text-generation
15
  widget:
16
  - text: Me explique como funciona um computador.
 
21
  example_title: História.
22
  - text: Escreva um poema bem interessante sobre o Sol e as flores.
23
  example_title: Escreva um poema.
24
+ model-index:
25
+ - name: gemma-portuguese-luana-2b
26
+ results:
27
+ - task:
28
+ type: text-generation
29
+ name: Text Generation
30
+ dataset:
31
+ name: ENEM Challenge (No Images)
32
+ type: eduagarcia/enem_challenge
33
+ split: train
34
+ args:
35
+ num_few_shot: 3
36
+ metrics:
37
+ - type: acc
38
+ value: 24.42
39
+ name: accuracy
40
+ source:
41
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/gemma-portuguese-luana-2b
42
+ name: Open Portuguese LLM Leaderboard
43
+ - task:
44
+ type: text-generation
45
+ name: Text Generation
46
+ dataset:
47
+ name: BLUEX (No Images)
48
+ type: eduagarcia-temp/BLUEX_without_images
49
+ split: train
50
+ args:
51
+ num_few_shot: 3
52
+ metrics:
53
+ - type: acc
54
+ value: 24.34
55
+ name: accuracy
56
+ source:
57
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/gemma-portuguese-luana-2b
58
+ name: Open Portuguese LLM Leaderboard
59
+ - task:
60
+ type: text-generation
61
+ name: Text Generation
62
+ dataset:
63
+ name: OAB Exams
64
+ type: eduagarcia/oab_exams
65
+ split: train
66
+ args:
67
+ num_few_shot: 3
68
+ metrics:
69
+ - type: acc
70
+ value: 27.11
71
+ name: accuracy
72
+ source:
73
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/gemma-portuguese-luana-2b
74
+ name: Open Portuguese LLM Leaderboard
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: Assin2 RTE
80
+ type: assin2
81
+ split: test
82
+ args:
83
+ num_few_shot: 15
84
+ metrics:
85
+ - type: f1_macro
86
+ value: 70.86
87
+ name: f1-macro
88
+ source:
89
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/gemma-portuguese-luana-2b
90
+ name: Open Portuguese LLM Leaderboard
91
+ - task:
92
+ type: text-generation
93
+ name: Text Generation
94
+ dataset:
95
+ name: Assin2 STS
96
+ type: eduagarcia/portuguese_benchmark
97
+ split: test
98
+ args:
99
+ num_few_shot: 15
100
+ metrics:
101
+ - type: pearson
102
+ value: 1.51
103
+ name: pearson
104
+ source:
105
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/gemma-portuguese-luana-2b
106
+ name: Open Portuguese LLM Leaderboard
107
+ - task:
108
+ type: text-generation
109
+ name: Text Generation
110
+ dataset:
111
+ name: FaQuAD NLI
112
+ type: ruanchaves/faquad-nli
113
+ split: test
114
+ args:
115
+ num_few_shot: 15
116
+ metrics:
117
+ - type: f1_macro
118
+ value: 43.97
119
+ name: f1-macro
120
+ source:
121
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/gemma-portuguese-luana-2b
122
+ name: Open Portuguese LLM Leaderboard
123
+ - task:
124
+ type: text-generation
125
+ name: Text Generation
126
+ dataset:
127
+ name: HateBR Binary
128
+ type: ruanchaves/hatebr
129
+ split: test
130
+ args:
131
+ num_few_shot: 25
132
+ metrics:
133
+ - type: f1_macro
134
+ value: 40.05
135
+ name: f1-macro
136
+ source:
137
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/gemma-portuguese-luana-2b
138
+ name: Open Portuguese LLM Leaderboard
139
+ - task:
140
+ type: text-generation
141
+ name: Text Generation
142
+ dataset:
143
+ name: PT Hate Speech Binary
144
+ type: hate_speech_portuguese
145
+ split: test
146
+ args:
147
+ num_few_shot: 25
148
+ metrics:
149
+ - type: f1_macro
150
+ value: 51.83
151
+ name: f1-macro
152
+ source:
153
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/gemma-portuguese-luana-2b
154
+ name: Open Portuguese LLM Leaderboard
155
+ - task:
156
+ type: text-generation
157
+ name: Text Generation
158
+ dataset:
159
+ name: tweetSentBR
160
+ type: eduagarcia/tweetsentbr_fewshot
161
+ split: test
162
+ args:
163
+ num_few_shot: 25
164
+ metrics:
165
+ - type: f1_macro
166
+ value: 30.42
167
+ name: f1-macro
168
+ source:
169
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/gemma-portuguese-luana-2b
170
+ name: Open Portuguese LLM Leaderboard
171
  ---
172
 
173
  # gemma-portuguese-2b-luana
 
298
  <a href="https://github.com/rhaymisonbetini" target="_blank">
299
  <img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white">
300
  </a>
301
+ </div>
302
+ # Open Portuguese LLM Leaderboard Evaluation Results
303
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/rhaymison/gemma-portuguese-luana-2b) and on the [�� Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
304
+
305
+ | Metric | Value |
306
+ |--------------------------|---------|
307
+ |Average |**34.94**|
308
+ |ENEM Challenge (No Images)| 24.42|
309
+ |BLUEX (No Images) | 24.34|
310
+ |OAB Exams | 27.11|
311
+ |Assin2 RTE | 70.86|
312
+ |Assin2 STS | 1.51|
313
+ |FaQuAD NLI | 43.97|
314
+ |HateBR Binary | 40.05|
315
+ |PT Hate Speech Binary | 51.83|
316
+ |tweetSentBR | 30.42|
317
+