Commit
78d0a3b
1 Parent(s): fe2536a

Adding the Open Portuguese LLM Leaderboard Evaluation Results (#7)

Browse files

- Adding the Open Portuguese LLM Leaderboard Evaluation Results (b811ce7f9d6a51eeb5de5a794b7567febf58875c)


Co-authored-by: Open PT LLM Leaderboard PR Bot <leaderboard-pt-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +146 -9
README.md CHANGED
@@ -1,14 +1,9 @@
1
  ---
2
- license: mit
3
  language:
4
  - pt
5
  - en
6
- metrics:
7
- - accuracy
8
- - f1
9
- - precision
10
- - recall
11
- pipeline_tag: text-generation
12
  tags:
13
  - LLM
14
  - Portuguese
@@ -16,8 +11,134 @@ tags:
16
  - Alpaca
17
  - Llama 2
18
  - Q&A
19
- library_name: peft
 
 
 
 
 
20
  inference: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ---
22
 
23
  # BODE
@@ -147,4 +268,20 @@ Contribuições para a melhoria deste modelo são bem-vindas. Sinta-se à vontad
147
 
148
  ## Agradecimentos
149
 
150
- Agradecemos ao Laboratório Nacional de Computação Científica (LNCC/MCTI, Brasil) por prover os recursos de CAD do supercomputador SDumont.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - pt
4
  - en
5
+ license: mit
6
+ library_name: peft
 
 
 
 
7
  tags:
8
  - LLM
9
  - Portuguese
 
11
  - Alpaca
12
  - Llama 2
13
  - Q&A
14
+ metrics:
15
+ - accuracy
16
+ - f1
17
+ - precision
18
+ - recall
19
+ pipeline_tag: text-generation
20
  inference: false
21
+ model-index:
22
+ - name: bode-7b-alpaca-pt-br
23
+ results:
24
+ - task:
25
+ type: text-generation
26
+ name: Text Generation
27
+ dataset:
28
+ name: ENEM Challenge (No Images)
29
+ type: eduagarcia/enem_challenge
30
+ split: train
31
+ args:
32
+ num_few_shot: 3
33
+ metrics:
34
+ - type: acc
35
+ value: 34.36
36
+ name: accuracy
37
+ source:
38
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-7b-alpaca-pt-br
39
+ name: Open Portuguese LLM Leaderboard
40
+ - task:
41
+ type: text-generation
42
+ name: Text Generation
43
+ dataset:
44
+ name: BLUEX (No Images)
45
+ type: eduagarcia-temp/BLUEX_without_images
46
+ split: train
47
+ args:
48
+ num_few_shot: 3
49
+ metrics:
50
+ - type: acc
51
+ value: 28.93
52
+ name: accuracy
53
+ source:
54
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-7b-alpaca-pt-br
55
+ name: Open Portuguese LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: OAB Exams
61
+ type: eduagarcia/oab_exams
62
+ split: train
63
+ args:
64
+ num_few_shot: 3
65
+ metrics:
66
+ - type: acc
67
+ value: 30.84
68
+ name: accuracy
69
+ source:
70
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-7b-alpaca-pt-br
71
+ name: Open Portuguese LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: Assin2 RTE
77
+ type: assin2
78
+ split: test
79
+ args:
80
+ num_few_shot: 15
81
+ metrics:
82
+ - type: f1_macro
83
+ value: 79.83
84
+ name: f1-macro
85
+ - type: pearson
86
+ value: 43.47
87
+ name: pearson
88
+ source:
89
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-7b-alpaca-pt-br
90
+ name: Open Portuguese LLM Leaderboard
91
+ - task:
92
+ type: text-generation
93
+ name: Text Generation
94
+ dataset:
95
+ name: FaQuAD NLI
96
+ type: ruanchaves/faquad-nli
97
+ split: test
98
+ args:
99
+ num_few_shot: 15
100
+ metrics:
101
+ - type: f1_macro
102
+ value: 67.45
103
+ name: f1-macro
104
+ source:
105
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-7b-alpaca-pt-br
106
+ name: Open Portuguese LLM Leaderboard
107
+ - task:
108
+ type: text-generation
109
+ name: Text Generation
110
+ dataset:
111
+ name: HateBR Binary
112
+ type: eduagarcia/portuguese_benchmark
113
+ split: test
114
+ args:
115
+ num_few_shot: 25
116
+ metrics:
117
+ - type: f1_macro
118
+ value: 85.06
119
+ name: f1-macro
120
+ - type: f1_macro
121
+ value: 65.73
122
+ name: f1-macro
123
+ source:
124
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-7b-alpaca-pt-br
125
+ name: Open Portuguese LLM Leaderboard
126
+ - task:
127
+ type: text-generation
128
+ name: Text Generation
129
+ dataset:
130
+ name: tweetSentBR
131
+ type: eduagarcia-temp/tweetsentbr
132
+ split: test
133
+ args:
134
+ num_few_shot: 25
135
+ metrics:
136
+ - type: f1_macro
137
+ value: 43.25
138
+ name: f1-macro
139
+ source:
140
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-7b-alpaca-pt-br
141
+ name: Open Portuguese LLM Leaderboard
142
  ---
143
 
144
  # BODE
 
268
 
269
  ## Agradecimentos
270
 
271
+ Agradecemos ao Laboratório Nacional de Computação Científica (LNCC/MCTI, Brasil) por prover os recursos de CAD do supercomputador SDumont.
272
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
273
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/recogna-nlp/bode-7b-alpaca-pt-br)
274
+
275
+ | Metric | Value |
276
+ |--------------------------|---------|
277
+ |Average |**53.21**|
278
+ |ENEM Challenge (No Images)| 34.36|
279
+ |BLUEX (No Images) | 28.93|
280
+ |OAB Exams | 30.84|
281
+ |Assin2 RTE | 79.83|
282
+ |Assin2 STS | 43.47|
283
+ |FaQuAD NLI | 67.45|
284
+ |HateBR Binary | 85.06|
285
+ |PT Hate Speech Binary | 65.73|
286
+ |tweetSentBR | 43.25|
287
+