Adding the Open Portuguese LLM Leaderboard Evaluation Results

#2
Files changed (1) hide show
  1. README.md +167 -1
README.md CHANGED
@@ -9,6 +9,153 @@ datasets:
9
  - argilla/ultrafeedback-binarized-preferences-cleaned
10
  - jondurbin/truthy-dpo-v0.1
11
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
  # Faro-Yi-9B-DPO
@@ -56,4 +203,23 @@ generated_ids = model.generate(input_ids, max_new_tokens=512, temperature=0.5)
56
  response = tokenizer.decode(generated_ids[0], skip_special_tokens=True) # Aye, matey! The Pythagorean theorem is a nautical rule that helps us find the length of the third side of a triangle. ...
57
  ```
58
 
59
- </details>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - argilla/ultrafeedback-binarized-preferences-cleaned
10
  - jondurbin/truthy-dpo-v0.1
11
  pipeline_tag: text-generation
12
+ model-index:
13
+ - name: Faro-Yi-34B-DPO
14
+ results:
15
+ - task:
16
+ type: text-generation
17
+ name: Text Generation
18
+ dataset:
19
+ name: ENEM Challenge (No Images)
20
+ type: eduagarcia/enem_challenge
21
+ split: train
22
+ args:
23
+ num_few_shot: 3
24
+ metrics:
25
+ - type: acc
26
+ value: 73.55
27
+ name: accuracy
28
+ source:
29
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B-DPO
30
+ name: Open Portuguese LLM Leaderboard
31
+ - task:
32
+ type: text-generation
33
+ name: Text Generation
34
+ dataset:
35
+ name: BLUEX (No Images)
36
+ type: eduagarcia-temp/BLUEX_without_images
37
+ split: train
38
+ args:
39
+ num_few_shot: 3
40
+ metrics:
41
+ - type: acc
42
+ value: 65.79
43
+ name: accuracy
44
+ source:
45
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B-DPO
46
+ name: Open Portuguese LLM Leaderboard
47
+ - task:
48
+ type: text-generation
49
+ name: Text Generation
50
+ dataset:
51
+ name: OAB Exams
52
+ type: eduagarcia/oab_exams
53
+ split: train
54
+ args:
55
+ num_few_shot: 3
56
+ metrics:
57
+ - type: acc
58
+ value: 55.85
59
+ name: accuracy
60
+ source:
61
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B-DPO
62
+ name: Open Portuguese LLM Leaderboard
63
+ - task:
64
+ type: text-generation
65
+ name: Text Generation
66
+ dataset:
67
+ name: Assin2 RTE
68
+ type: assin2
69
+ split: test
70
+ args:
71
+ num_few_shot: 15
72
+ metrics:
73
+ - type: f1_macro
74
+ value: 92.2
75
+ name: f1-macro
76
+ source:
77
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B-DPO
78
+ name: Open Portuguese LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: Assin2 STS
84
+ type: eduagarcia/portuguese_benchmark
85
+ split: test
86
+ args:
87
+ num_few_shot: 15
88
+ metrics:
89
+ - type: pearson
90
+ value: 79.78
91
+ name: pearson
92
+ source:
93
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B-DPO
94
+ name: Open Portuguese LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: FaQuAD NLI
100
+ type: ruanchaves/faquad-nli
101
+ split: test
102
+ args:
103
+ num_few_shot: 15
104
+ metrics:
105
+ - type: f1_macro
106
+ value: 71.0
107
+ name: f1-macro
108
+ source:
109
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B-DPO
110
+ name: Open Portuguese LLM Leaderboard
111
+ - task:
112
+ type: text-generation
113
+ name: Text Generation
114
+ dataset:
115
+ name: HateBR Binary
116
+ type: ruanchaves/hatebr
117
+ split: test
118
+ args:
119
+ num_few_shot: 25
120
+ metrics:
121
+ - type: f1_macro
122
+ value: 85.12
123
+ name: f1-macro
124
+ source:
125
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B-DPO
126
+ name: Open Portuguese LLM Leaderboard
127
+ - task:
128
+ type: text-generation
129
+ name: Text Generation
130
+ dataset:
131
+ name: PT Hate Speech Binary
132
+ type: hate_speech_portuguese
133
+ split: test
134
+ args:
135
+ num_few_shot: 25
136
+ metrics:
137
+ - type: f1_macro
138
+ value: 68.88
139
+ name: f1-macro
140
+ source:
141
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B-DPO
142
+ name: Open Portuguese LLM Leaderboard
143
+ - task:
144
+ type: text-generation
145
+ name: Text Generation
146
+ dataset:
147
+ name: tweetSentBR
148
+ type: eduagarcia/tweetsentbr_fewshot
149
+ split: test
150
+ args:
151
+ num_few_shot: 25
152
+ metrics:
153
+ - type: f1_macro
154
+ value: 72.24
155
+ name: f1-macro
156
+ source:
157
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B-DPO
158
+ name: Open Portuguese LLM Leaderboard
159
  ---
160
 
161
  # Faro-Yi-9B-DPO
 
203
  response = tokenizer.decode(generated_ids[0], skip_special_tokens=True) # Aye, matey! The Pythagorean theorem is a nautical rule that helps us find the length of the third side of a triangle. ...
204
  ```
205
 
206
+ </details>
207
+
208
+
209
+ # Open Portuguese LLM Leaderboard Evaluation Results
210
+
211
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/wenbopan/Faro-Yi-34B-DPO) and on the [πŸš€ Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
212
+
213
+ | Metric | Value |
214
+ |--------------------------|---------|
215
+ |Average |**73.82**|
216
+ |ENEM Challenge (No Images)| 73.55|
217
+ |BLUEX (No Images) | 65.79|
218
+ |OAB Exams | 55.85|
219
+ |Assin2 RTE | 92.20|
220
+ |Assin2 STS | 79.78|
221
+ |FaQuAD NLI | 71|
222
+ |HateBR Binary | 85.12|
223
+ |PT Hate Speech Binary | 68.88|
224
+ |tweetSentBR | 72.24|
225
+