jarodrigues
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -134,7 +134,7 @@ For testing, we reserved the translated datasets MRPC (similarity) and RTE (infe
|
|
134 |
| **LLaMA-2 Chat (English)** | 0.5432 | 0.3807 | **0.5493**|
|
135 |
<br>
|
136 |
|
137 |
-
For further testing our decoder, in addition to the testing data described above, we also reused some of the datasets that had been resorted for
|
138 |
|
139 |
The scores of Sabiá invite to contrast them with Gervásio's but such comparison needs to be taken with some caution.
|
140 |
- First, these are a repetition of the scores presented in the respective paper, which only provide results for a single run of each task, while scores of Gervásio are the average of three runs, with different seeds.
|
@@ -146,8 +146,8 @@ To evaluate Gervásio, the examples were randomly selected to be included in the
|
|
146 |
| Model | ENEM 2022 (Accuracy) | BLUEX (Accuracy)| RTE (F1) | STS (Pearson) |
|
147 |
|--------------------------|----------------------|-----------------|-----------|---------------|
|
148 |
| **Gervásio 7B PTBR** | 0.1977 | 0.2640 | **0.7469**| **0.2136** |
|
149 |
-
| **LLaMA-2** | 0.2458 | 0.2903 | 0.0913 | 0.1034 |
|
150 |
-
| **LLaMA-2 Chat** | 0.2231 | 0.2959 | 0.5546 | 0.1750 |
|
151 |
||||||
|
152 |
| **Sabiá-7B** | **0.6017** | **0.7743** | 0.6847 | 0.1363 |
|
153 |
|
|
|
134 |
| **LLaMA-2 Chat (English)** | 0.5432 | 0.3807 | **0.5493**|
|
135 |
<br>
|
136 |
|
137 |
+
For further testing our decoder, in addition to the testing data described above, we also reused some of the datasets that had been resorted for PTBR to test the state-of-the-art Sabiá model and that were originally developed with materials from Portuguese: ASSIN2 RTE (entailment) and ASSIN2 STS (similarity), BLUEX (question answering), ENEM 2022 (question answering) and FaQuAD (extractive question-answering).
|
138 |
|
139 |
The scores of Sabiá invite to contrast them with Gervásio's but such comparison needs to be taken with some caution.
|
140 |
- First, these are a repetition of the scores presented in the respective paper, which only provide results for a single run of each task, while scores of Gervásio are the average of three runs, with different seeds.
|
|
|
146 |
| Model | ENEM 2022 (Accuracy) | BLUEX (Accuracy)| RTE (F1) | STS (Pearson) |
|
147 |
|--------------------------|----------------------|-----------------|-----------|---------------|
|
148 |
| **Gervásio 7B PTBR** | 0.1977 | 0.2640 | **0.7469**| **0.2136** |
|
149 |
+
| **LLaMA-2 (English)** | 0.2458 | 0.2903 | 0.0913 | 0.1034 |
|
150 |
+
| **LLaMA-2 Chat (English)** | 0.2231 | 0.2959 | 0.5546 | 0.1750 |
|
151 |
||||||
|
152 |
| **Sabiá-7B** | **0.6017** | **0.7743** | 0.6847 | 0.1363 |
|
153 |
|