Commit
4f92445
1 Parent(s): 54078ab

Adding the Open Portuguese LLM Leaderboard Evaluation Results (#2)

Browse files

- Adding the Open Portuguese LLM Leaderboard Evaluation Results (0938d09db986a0bc71eeebb27917cfbc1d52f456)


Co-authored-by: Open PT LLM Leaderboard PR Bot <leaderboard-pt-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +140 -3
README.md CHANGED
@@ -1,9 +1,130 @@
1
  ---
2
- datasets:
3
- - cnmoro/WizardVicuna-PTBR-Instruct-Clean
4
  language:
5
  - en
6
  - pt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  This is a finetuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) using [unsloth](https://github.com/unslothai/unsloth) on a instruct portuguese dataset, as an attempt to improve the performance of the model on the language.
@@ -14,4 +135,20 @@ The original prompt format was used:
14
 
15
  ```plaintext
16
  <s>[INST] {Prompt goes here} [/INST]
17
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  language:
3
  - en
4
  - pt
5
+ datasets:
6
+ - cnmoro/WizardVicuna-PTBR-Instruct-Clean
7
+ model-index:
8
+ - name: Mistral-7B-Portuguese
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ name: Text Generation
13
+ dataset:
14
+ name: ENEM Challenge (No Images)
15
+ type: eduagarcia/enem_challenge
16
+ split: train
17
+ args:
18
+ num_few_shot: 3
19
+ metrics:
20
+ - type: acc
21
+ value: 58.08
22
+ name: accuracy
23
+ source:
24
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
25
+ name: Open Portuguese LLM Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: BLUEX (No Images)
31
+ type: eduagarcia-temp/BLUEX_without_images
32
+ split: train
33
+ args:
34
+ num_few_shot: 3
35
+ metrics:
36
+ - type: acc
37
+ value: 48.68
38
+ name: accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
41
+ name: Open Portuguese LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: OAB Exams
47
+ type: eduagarcia/oab_exams
48
+ split: train
49
+ args:
50
+ num_few_shot: 3
51
+ metrics:
52
+ - type: acc
53
+ value: 37.08
54
+ name: accuracy
55
+ source:
56
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
57
+ name: Open Portuguese LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ name: Text Generation
61
+ dataset:
62
+ name: Assin2 RTE
63
+ type: assin2
64
+ split: test
65
+ args:
66
+ num_few_shot: 15
67
+ metrics:
68
+ - type: f1_macro
69
+ value: 90.31
70
+ name: f1-macro
71
+ - type: pearson
72
+ value: 76.55
73
+ name: pearson
74
+ source:
75
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
76
+ name: Open Portuguese LLM Leaderboard
77
+ - task:
78
+ type: text-generation
79
+ name: Text Generation
80
+ dataset:
81
+ name: FaQuAD NLI
82
+ type: ruanchaves/faquad-nli
83
+ split: test
84
+ args:
85
+ num_few_shot: 15
86
+ metrics:
87
+ - type: f1_macro
88
+ value: 58.84
89
+ name: f1-macro
90
+ source:
91
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
92
+ name: Open Portuguese LLM Leaderboard
93
+ - task:
94
+ type: text-generation
95
+ name: Text Generation
96
+ dataset:
97
+ name: HateBR Binary
98
+ type: eduagarcia/portuguese_benchmark
99
+ split: test
100
+ args:
101
+ num_few_shot: 25
102
+ metrics:
103
+ - type: f1_macro
104
+ value: 79.21
105
+ name: f1-macro
106
+ - type: f1_macro
107
+ value: 68.87
108
+ name: f1-macro
109
+ source:
110
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
111
+ name: Open Portuguese LLM Leaderboard
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: tweetSentBR
117
+ type: eduagarcia-temp/tweetsentbr
118
+ split: test
119
+ args:
120
+ num_few_shot: 25
121
+ metrics:
122
+ - type: f1_macro
123
+ value: 64.71
124
+ name: f1-macro
125
+ source:
126
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
127
+ name: Open Portuguese LLM Leaderboard
128
  ---
129
 
130
  This is a finetuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) using [unsloth](https://github.com/unslothai/unsloth) on a instruct portuguese dataset, as an attempt to improve the performance of the model on the language.
 
135
 
136
  ```plaintext
137
  <s>[INST] {Prompt goes here} [/INST]
138
+ ```
139
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
140
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/cnmoro/Mistral-7B-Portuguese)
141
+
142
+ | Metric | Value |
143
+ |--------------------------|--------|
144
+ |Average |**64.7**|
145
+ |ENEM Challenge (No Images)| 58.08|
146
+ |BLUEX (No Images) | 48.68|
147
+ |OAB Exams | 37.08|
148
+ |Assin2 RTE | 90.31|
149
+ |Assin2 STS | 76.55|
150
+ |FaQuAD NLI | 58.84|
151
+ |HateBR Binary | 79.21|
152
+ |PT Hate Speech Binary | 68.87|
153
+ |tweetSentBR | 64.71|
154
+