leaderboard-pt-pr-bot commited on
Commit
b811ce7
1 Parent(s): fe2536a

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +146 -9
README.md CHANGED
@@ -1,14 +1,9 @@
1
  ---
2
- license: mit
3
  language:
4
  - pt
5
  - en
6
- metrics:
7
- - accuracy
8
- - f1
9
- - precision
10
- - recall
11
- pipeline_tag: text-generation
12
  tags:
13
  - LLM
14
  - Portuguese
@@ -16,8 +11,134 @@ tags:
16
  - Alpaca
17
  - Llama 2
18
  - Q&A
19
- library_name: peft
 
 
 
 
 
20
  inference: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ---
22
 
23
  # BODE
@@ -147,4 +268,20 @@ Contribuições para a melhoria deste modelo são bem-vindas. Sinta-se à vontad
147
 
148
  ## Agradecimentos
149
 
150
- Agradecemos ao Laboratório Nacional de Computação Científica (LNCC/MCTI, Brasil) por prover os recursos de CAD do supercomputador SDumont.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - pt
4
  - en
5
+ license: mit
6
+ library_name: peft
 
 
 
 
7
  tags:
8
  - LLM
9
  - Portuguese
 
11
  - Alpaca
12
  - Llama 2
13
  - Q&A
14
+ metrics:
15
+ - accuracy
16
+ - f1
17
+ - precision
18
+ - recall
19
+ pipeline_tag: text-generation
20
  inference: false
21
+ model-index:
22
+ - name: bode-7b-alpaca-pt-br
23
+ results:
24
+ - task:
25
+ type: text-generation
26
+ name: Text Generation
27
+ dataset:
28
+ name: ENEM Challenge (No Images)
29
+ type: eduagarcia/enem_challenge
30
+ split: train
31
+ args:
32
+ num_few_shot: 3
33
+ metrics:
34
+ - type: acc
35
+ value: 34.36
36
+ name: accuracy
37
+ source:
38
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-7b-alpaca-pt-br
39
+ name: Open Portuguese LLM Leaderboard
40
+ - task:
41
+ type: text-generation
42
+ name: Text Generation
43
+ dataset:
44
+ name: BLUEX (No Images)
45
+ type: eduagarcia-temp/BLUEX_without_images
46
+ split: train
47
+ args:
48
+ num_few_shot: 3
49
+ metrics:
50
+ - type: acc
51
+ value: 28.93
52
+ name: accuracy
53
+ source:
54
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-7b-alpaca-pt-br
55
+ name: Open Portuguese LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: OAB Exams
61
+ type: eduagarcia/oab_exams
62
+ split: train
63
+ args:
64
+ num_few_shot: 3
65
+ metrics:
66
+ - type: acc
67
+ value: 30.84
68
+ name: accuracy
69
+ source:
70
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-7b-alpaca-pt-br
71
+ name: Open Portuguese LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: Assin2 RTE
77
+ type: assin2
78
+ split: test
79
+ args:
80
+ num_few_shot: 15
81
+ metrics:
82
+ - type: f1_macro
83
+ value: 79.83
84
+ name: f1-macro
85
+ - type: pearson
86
+ value: 43.47
87
+ name: pearson
88
+ source:
89
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-7b-alpaca-pt-br
90
+ name: Open Portuguese LLM Leaderboard
91
+ - task:
92
+ type: text-generation
93
+ name: Text Generation
94
+ dataset:
95
+ name: FaQuAD NLI
96
+ type: ruanchaves/faquad-nli
97
+ split: test
98
+ args:
99
+ num_few_shot: 15
100
+ metrics:
101
+ - type: f1_macro
102
+ value: 67.45
103
+ name: f1-macro
104
+ source:
105
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-7b-alpaca-pt-br
106
+ name: Open Portuguese LLM Leaderboard
107
+ - task:
108
+ type: text-generation
109
+ name: Text Generation
110
+ dataset:
111
+ name: HateBR Binary
112
+ type: eduagarcia/portuguese_benchmark
113
+ split: test
114
+ args:
115
+ num_few_shot: 25
116
+ metrics:
117
+ - type: f1_macro
118
+ value: 85.06
119
+ name: f1-macro
120
+ - type: f1_macro
121
+ value: 65.73
122
+ name: f1-macro
123
+ source:
124
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-7b-alpaca-pt-br
125
+ name: Open Portuguese LLM Leaderboard
126
+ - task:
127
+ type: text-generation
128
+ name: Text Generation
129
+ dataset:
130
+ name: tweetSentBR
131
+ type: eduagarcia-temp/tweetsentbr
132
+ split: test
133
+ args:
134
+ num_few_shot: 25
135
+ metrics:
136
+ - type: f1_macro
137
+ value: 43.25
138
+ name: f1-macro
139
+ source:
140
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-7b-alpaca-pt-br
141
+ name: Open Portuguese LLM Leaderboard
142
  ---
143
 
144
  # BODE
 
268
 
269
  ## Agradecimentos
270
 
271
+ Agradecemos ao Laboratório Nacional de Computação Científica (LNCC/MCTI, Brasil) por prover os recursos de CAD do supercomputador SDumont.
272
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
273
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/recogna-nlp/bode-7b-alpaca-pt-br)
274
+
275
+ | Metric | Value |
276
+ |--------------------------|---------|
277
+ |Average |**53.21**|
278
+ |ENEM Challenge (No Images)| 34.36|
279
+ |BLUEX (No Images) | 28.93|
280
+ |OAB Exams | 30.84|
281
+ |Assin2 RTE | 79.83|
282
+ |Assin2 STS | 43.47|
283
+ |FaQuAD NLI | 67.45|
284
+ |HateBR Binary | 85.06|
285
+ |PT Hate Speech Binary | 65.73|
286
+ |tweetSentBR | 43.25|
287
+