Adding the Open Portuguese LLM Leaderboard Evaluation Results

#1
Files changed (1) hide show
  1. README.md +148 -10
README.md CHANGED
@@ -1,19 +1,140 @@
1
  ---
2
  language:
3
- - pt
4
- - en
5
  license: cc
6
  tags:
7
- - text-generation-inference
8
- - transformers
9
- - qwen
10
- - gguf
11
- - brazil
12
- - brasil
13
- - 14b
14
- - portuguese
15
  base_model: Qwen/Qwen1.5-14B-Chat
16
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ---
18
  # Cabra Qwen 14b
19
  <img src="https://uploads-ssl.webflow.com/65f77c0240ae1c68f8192771/660b1a4de37e3389b7220262_cabra3.png" width="400" height="400">
@@ -161,3 +282,20 @@ O modelo é destinado, por agora, a fins de pesquisa. As áreas e tarefas de pes
161
  | | | exam_id__2014-15 | 3 | acc | 0.5897 | ± 0.0323 |
162
  | portuguese_hate_speech_binary | 1.0 | all | 25 | f1_macro | 0.7180 | ± 0.0115 |
163
  | | | all | 25 | acc | 0.7462 | ± 0.0106 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
+ - pt
4
+ - en
5
  license: cc
6
  tags:
7
+ - text-generation-inference
8
+ - transformers
9
+ - qwen
10
+ - gguf
11
+ - brazil
12
+ - brasil
13
+ - 14b
14
+ - portuguese
15
  base_model: Qwen/Qwen1.5-14B-Chat
16
  pipeline_tag: text-generation
17
+ model-index:
18
+ - name: CabraQwen14b
19
+ results:
20
+ - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ name: ENEM Challenge (No Images)
25
+ type: eduagarcia/enem_challenge
26
+ split: train
27
+ args:
28
+ num_few_shot: 3
29
+ metrics:
30
+ - type: acc
31
+ value: 75.16
32
+ name: accuracy
33
+ source:
34
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraQwen14b
35
+ name: Open Portuguese LLM Leaderboard
36
+ - task:
37
+ type: text-generation
38
+ name: Text Generation
39
+ dataset:
40
+ name: BLUEX (No Images)
41
+ type: eduagarcia-temp/BLUEX_without_images
42
+ split: train
43
+ args:
44
+ num_few_shot: 3
45
+ metrics:
46
+ - type: acc
47
+ value: 60.78
48
+ name: accuracy
49
+ source:
50
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraQwen14b
51
+ name: Open Portuguese LLM Leaderboard
52
+ - task:
53
+ type: text-generation
54
+ name: Text Generation
55
+ dataset:
56
+ name: OAB Exams
57
+ type: eduagarcia/oab_exams
58
+ split: train
59
+ args:
60
+ num_few_shot: 3
61
+ metrics:
62
+ - type: acc
63
+ value: 49.89
64
+ name: accuracy
65
+ source:
66
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraQwen14b
67
+ name: Open Portuguese LLM Leaderboard
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: Assin2 RTE
73
+ type: assin2
74
+ split: test
75
+ args:
76
+ num_few_shot: 15
77
+ metrics:
78
+ - type: f1_macro
79
+ value: 91.42
80
+ name: f1-macro
81
+ - type: pearson
82
+ value: 80.85
83
+ name: pearson
84
+ source:
85
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraQwen14b
86
+ name: Open Portuguese LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: FaQuAD NLI
92
+ type: ruanchaves/faquad-nli
93
+ split: test
94
+ args:
95
+ num_few_shot: 15
96
+ metrics:
97
+ - type: f1_macro
98
+ value: 46.05
99
+ name: f1-macro
100
+ source:
101
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraQwen14b
102
+ name: Open Portuguese LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: HateBR Binary
108
+ type: eduagarcia/portuguese_benchmark
109
+ split: test
110
+ args:
111
+ num_few_shot: 25
112
+ metrics:
113
+ - type: f1_macro
114
+ value: 79.32
115
+ name: f1-macro
116
+ - type: f1_macro
117
+ value: 71.8
118
+ name: f1-macro
119
+ source:
120
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraQwen14b
121
+ name: Open Portuguese LLM Leaderboard
122
+ - task:
123
+ type: text-generation
124
+ name: Text Generation
125
+ dataset:
126
+ name: tweetSentBR
127
+ type: eduagarcia-temp/tweetsentbr
128
+ split: test
129
+ args:
130
+ num_few_shot: 25
131
+ metrics:
132
+ - type: f1_macro
133
+ value: 62.65
134
+ name: f1-macro
135
+ source:
136
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraQwen14b
137
+ name: Open Portuguese LLM Leaderboard
138
  ---
139
  # Cabra Qwen 14b
140
  <img src="https://uploads-ssl.webflow.com/65f77c0240ae1c68f8192771/660b1a4de37e3389b7220262_cabra3.png" width="400" height="400">
 
282
  | | | exam_id__2014-15 | 3 | acc | 0.5897 | ± 0.0323 |
283
  | portuguese_hate_speech_binary | 1.0 | all | 25 | f1_macro | 0.7180 | ± 0.0115 |
284
  | | | all | 25 | acc | 0.7462 | ± 0.0106 |
285
+
286
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
287
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/nicolasdec/CabraQwen14b)
288
+
289
+ | Metric | Value |
290
+ |--------------------------|---------|
291
+ |Average |**68.66**|
292
+ |ENEM Challenge (No Images)| 75.16|
293
+ |BLUEX (No Images) | 60.78|
294
+ |OAB Exams | 49.89|
295
+ |Assin2 RTE | 91.42|
296
+ |Assin2 STS | 80.85|
297
+ |FaQuAD NLI | 46.05|
298
+ |HateBR Binary | 79.32|
299
+ |PT Hate Speech Binary | 71.80|
300
+ |tweetSentBR | 62.65|
301
+