Adding the Open Portuguese LLM Leaderboard Evaluation Results

#1
Files changed (1) hide show
  1. README.md +166 -3
README.md CHANGED
@@ -1,9 +1,156 @@
1
  ---
 
 
2
  license: apache-2.0
3
  datasets:
4
  - nicholasKluge/Pt-Corpus
5
- language:
6
- - pt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  É um modelo base pré-treinado com cerca de 1b tokens em portugues iniciado com os pesos oficiais do modelo, deve ser utilizado para fine tuning.
@@ -18,4 +165,20 @@ Obs: Aguardando [resultados oficiais](https://huggingface.co/spaces/eduagarcia/o
18
  | faquad_nli | 68,11 | 47,63 | 20,48 |
19
  | hatebr_offensive_binary | 79,65 | 77,63 | 2,02 |
20
  | oab_exams | 45,42 | 45,24 | 0,18 |
21
- | portuguese_hate_speech_binary| 59,18 | 55,72 | 3,46 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - pt
4
  license: apache-2.0
5
  datasets:
6
  - nicholasKluge/Pt-Corpus
7
+ model-index:
8
+ - name: Mistral-7B-v0.2-Base_ptbr
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ name: Text Generation
13
+ dataset:
14
+ name: ENEM Challenge (No Images)
15
+ type: eduagarcia/enem_challenge
16
+ split: train
17
+ args:
18
+ num_few_shot: 3
19
+ metrics:
20
+ - type: acc
21
+ value: 64.94
22
+ name: accuracy
23
+ source:
24
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
25
+ name: Open Portuguese LLM Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: BLUEX (No Images)
31
+ type: eduagarcia-temp/BLUEX_without_images
32
+ split: train
33
+ args:
34
+ num_few_shot: 3
35
+ metrics:
36
+ - type: acc
37
+ value: 53.96
38
+ name: accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
41
+ name: Open Portuguese LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: OAB Exams
47
+ type: eduagarcia/oab_exams
48
+ split: train
49
+ args:
50
+ num_few_shot: 3
51
+ metrics:
52
+ - type: acc
53
+ value: 45.42
54
+ name: accuracy
55
+ source:
56
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
57
+ name: Open Portuguese LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ name: Text Generation
61
+ dataset:
62
+ name: Assin2 RTE
63
+ type: assin2
64
+ split: test
65
+ args:
66
+ num_few_shot: 15
67
+ metrics:
68
+ - type: f1_macro
69
+ value: 90.11
70
+ name: f1-macro
71
+ source:
72
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
73
+ name: Open Portuguese LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: Assin2 STS
79
+ type: eduagarcia/portuguese_benchmark
80
+ split: test
81
+ args:
82
+ num_few_shot: 15
83
+ metrics:
84
+ - type: pearson
85
+ value: 72.51
86
+ name: pearson
87
+ source:
88
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
89
+ name: Open Portuguese LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: FaQuAD NLI
95
+ type: ruanchaves/faquad-nli
96
+ split: test
97
+ args:
98
+ num_few_shot: 15
99
+ metrics:
100
+ - type: f1_macro
101
+ value: 69.04
102
+ name: f1-macro
103
+ source:
104
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
105
+ name: Open Portuguese LLM Leaderboard
106
+ - task:
107
+ type: text-generation
108
+ name: Text Generation
109
+ dataset:
110
+ name: HateBR Binary
111
+ type: ruanchaves/hatebr
112
+ split: test
113
+ args:
114
+ num_few_shot: 25
115
+ metrics:
116
+ - type: f1_macro
117
+ value: 79.62
118
+ name: f1-macro
119
+ source:
120
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
121
+ name: Open Portuguese LLM Leaderboard
122
+ - task:
123
+ type: text-generation
124
+ name: Text Generation
125
+ dataset:
126
+ name: PT Hate Speech Binary
127
+ type: hate_speech_portuguese
128
+ split: test
129
+ args:
130
+ num_few_shot: 25
131
+ metrics:
132
+ - type: f1_macro
133
+ value: 58.52
134
+ name: f1-macro
135
+ source:
136
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
137
+ name: Open Portuguese LLM Leaderboard
138
+ - task:
139
+ type: text-generation
140
+ name: Text Generation
141
+ dataset:
142
+ name: tweetSentBR
143
+ type: eduagarcia/tweetsentbr_fewshot
144
+ split: test
145
+ args:
146
+ num_few_shot: 25
147
+ metrics:
148
+ - type: f1_macro
149
+ value: 62.32
150
+ name: f1-macro
151
+ source:
152
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
153
+ name: Open Portuguese LLM Leaderboard
154
  ---
155
 
156
  É um modelo base pré-treinado com cerca de 1b tokens em portugues iniciado com os pesos oficiais do modelo, deve ser utilizado para fine tuning.
 
165
  | faquad_nli | 68,11 | 47,63 | 20,48 |
166
  | hatebr_offensive_binary | 79,65 | 77,63 | 2,02 |
167
  | oab_exams | 45,42 | 45,24 | 0,18 |
168
+ | portuguese_hate_speech_binary| 59,18 | 55,72 | 3,46 |
169
+ # Open Portuguese LLM Leaderboard Evaluation Results
170
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/JJhooww/Mistral-7B-v0.2-Base_ptbr) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
171
+
172
+ | Metric | Value |
173
+ |--------------------------|---------|
174
+ |Average |**66.27**|
175
+ |ENEM Challenge (No Images)| 64.94|
176
+ |BLUEX (No Images) | 53.96|
177
+ |OAB Exams | 45.42|
178
+ |Assin2 RTE | 90.11|
179
+ |Assin2 STS | 72.51|
180
+ |FaQuAD NLI | 69.04|
181
+ |HateBR Binary | 79.62|
182
+ |PT Hate Speech Binary | 58.52|
183
+ |tweetSentBR | 62.32|
184
+