leaderboard-pt-pr-bot commited on
Commit
039264b
1 Parent(s): 4c2faab

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +165 -1
README.md CHANGED
@@ -1,7 +1,154 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - pt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
  **Nome do Modelo:** Legislinho
7
 
@@ -53,3 +200,20 @@ print(answer.split(prompt)[1])
53
  Se quiser ajudar o desenvolvimento do Legislinho e outros projetos no forno. Considere doar:
54
 
55
  <a href='https://ko-fi.com/maguscorp' target='_blank'><img height='35' style='border:0px;height:46px;' src='https://az743702.vo.msecnd.net/cdn/kofi3.png?v=0' border='0' alt='Buy Me a Coffee at ko-fi.com'/>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - pt
4
+ license: apache-2.0
5
+ model-index:
6
+ - name: legislinho
7
+ results:
8
+ - task:
9
+ type: text-generation
10
+ name: Text Generation
11
+ dataset:
12
+ name: ENEM Challenge (No Images)
13
+ type: eduagarcia/enem_challenge
14
+ split: train
15
+ args:
16
+ num_few_shot: 3
17
+ metrics:
18
+ - type: acc
19
+ value: 63.05
20
+ name: accuracy
21
+ source:
22
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MagusCorp/legislinho
23
+ name: Open Portuguese LLM Leaderboard
24
+ - task:
25
+ type: text-generation
26
+ name: Text Generation
27
+ dataset:
28
+ name: BLUEX (No Images)
29
+ type: eduagarcia-temp/BLUEX_without_images
30
+ split: train
31
+ args:
32
+ num_few_shot: 3
33
+ metrics:
34
+ - type: acc
35
+ value: 51.04
36
+ name: accuracy
37
+ source:
38
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MagusCorp/legislinho
39
+ name: Open Portuguese LLM Leaderboard
40
+ - task:
41
+ type: text-generation
42
+ name: Text Generation
43
+ dataset:
44
+ name: OAB Exams
45
+ type: eduagarcia/oab_exams
46
+ split: train
47
+ args:
48
+ num_few_shot: 3
49
+ metrics:
50
+ - type: acc
51
+ value: 43.23
52
+ name: accuracy
53
+ source:
54
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MagusCorp/legislinho
55
+ name: Open Portuguese LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: Assin2 RTE
61
+ type: assin2
62
+ split: test
63
+ args:
64
+ num_few_shot: 15
65
+ metrics:
66
+ - type: f1_macro
67
+ value: 88.7
68
+ name: f1-macro
69
+ source:
70
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MagusCorp/legislinho
71
+ name: Open Portuguese LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: Assin2 STS
77
+ type: eduagarcia/portuguese_benchmark
78
+ split: test
79
+ args:
80
+ num_few_shot: 15
81
+ metrics:
82
+ - type: pearson
83
+ value: 67.76
84
+ name: pearson
85
+ source:
86
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MagusCorp/legislinho
87
+ name: Open Portuguese LLM Leaderboard
88
+ - task:
89
+ type: text-generation
90
+ name: Text Generation
91
+ dataset:
92
+ name: FaQuAD NLI
93
+ type: ruanchaves/faquad-nli
94
+ split: test
95
+ args:
96
+ num_few_shot: 15
97
+ metrics:
98
+ - type: f1_macro
99
+ value: 63.8
100
+ name: f1-macro
101
+ source:
102
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MagusCorp/legislinho
103
+ name: Open Portuguese LLM Leaderboard
104
+ - task:
105
+ type: text-generation
106
+ name: Text Generation
107
+ dataset:
108
+ name: HateBR Binary
109
+ type: ruanchaves/hatebr
110
+ split: test
111
+ args:
112
+ num_few_shot: 25
113
+ metrics:
114
+ - type: f1_macro
115
+ value: 72.64
116
+ name: f1-macro
117
+ source:
118
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MagusCorp/legislinho
119
+ name: Open Portuguese LLM Leaderboard
120
+ - task:
121
+ type: text-generation
122
+ name: Text Generation
123
+ dataset:
124
+ name: PT Hate Speech Binary
125
+ type: hate_speech_portuguese
126
+ split: test
127
+ args:
128
+ num_few_shot: 25
129
+ metrics:
130
+ - type: f1_macro
131
+ value: 65.63
132
+ name: f1-macro
133
+ source:
134
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MagusCorp/legislinho
135
+ name: Open Portuguese LLM Leaderboard
136
+ - task:
137
+ type: text-generation
138
+ name: Text Generation
139
+ dataset:
140
+ name: tweetSentBR
141
+ type: eduagarcia/tweetsentbr_fewshot
142
+ split: test
143
+ args:
144
+ num_few_shot: 25
145
+ metrics:
146
+ - type: f1_macro
147
+ value: 56.52
148
+ name: f1-macro
149
+ source:
150
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MagusCorp/legislinho
151
+ name: Open Portuguese LLM Leaderboard
152
  ---
153
  **Nome do Modelo:** Legislinho
154
 
 
200
  Se quiser ajudar o desenvolvimento do Legislinho e outros projetos no forno. Considere doar:
201
 
202
  <a href='https://ko-fi.com/maguscorp' target='_blank'><img height='35' style='border:0px;height:46px;' src='https://az743702.vo.msecnd.net/cdn/kofi3.png?v=0' border='0' alt='Buy Me a Coffee at ko-fi.com'/>
203
+
204
+ # Open Portuguese LLM Leaderboard Evaluation Results
205
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/MagusCorp/legislinho) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
206
+
207
+ | Metric | Value |
208
+ |--------------------------|--------|
209
+ |Average |**63.6**|
210
+ |ENEM Challenge (No Images)| 63.05|
211
+ |BLUEX (No Images) | 51.04|
212
+ |OAB Exams | 43.23|
213
+ |Assin2 RTE | 88.70|
214
+ |Assin2 STS | 67.76|
215
+ |FaQuAD NLI | 63.80|
216
+ |HateBR Binary | 72.64|
217
+ |PT Hate Speech Binary | 65.63|
218
+ |tweetSentBR | 56.52|
219
+