PORTULAN
/

gervasio-7b-portuguese-ptbr-decoder

Model card Files Files and versions Community

jarodrigues commited on Mar 1, 2024

Commit

8c67523

·

verified ·

1 Parent(s): 3518217

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -134,7 +134,7 @@ For testing, we reserved the translated datasets MRPC (similarity) and RTE (infe
 | **LLaMA-2 Chat (English)**    | 0.5432         | 0.3807         | **0.5493**|
 <br>
-For further testing our decoder, in addition to the testing data described above, we also reused some of the datasets that had been resorted for American Portuguese to test the state-of-the-art Sabiá model and that were originally developed with materials from Portuguese: ASSIN2 RTE (entailment) and ASSIN2 STS (similarity), BLUEX (question answering), ENEM 2022 (question answering) and FaQuAD (extractive question-answering).
 The scores of Sabiá invite to contrast them with Gervásio's but such comparison needs to be taken with some caution.
 - First, these are a repetition of the scores presented in the respective paper, which only provide results for a single run of each task, while scores of Gervásio are the average of three runs, with different seeds.
@@ -146,8 +146,8 @@ To evaluate Gervásio, the examples were randomly selected to be included in the
 | Model                    | ENEM 2022 (Accuracy) | BLUEX (Accuracy)| RTE (F1)  | STS (Pearson) |
 |--------------------------|----------------------|-----------------|-----------|---------------|
 | **Gervásio 7B PTBR**    | 0.1977               | 0.2640          | **0.7469**| **0.2136**    |
-| **LLaMA-2**              | 0.2458               | 0.2903          | 0.0913    | 0.1034        |
-| **LLaMA-2 Chat**         | 0.2231               | 0.2959          | 0.5546    | 0.1750        |
 ||||||
 | **Sabiá-7B**             | **0.6017**           | **0.7743**      | 0.6847    | 0.1363        |

 | **LLaMA-2 Chat (English)**    | 0.5432         | 0.3807         | **0.5493**|
 <br>
+For further testing our decoder, in addition to the testing data described above, we also reused some of the datasets that had been resorted for PTBR to test the state-of-the-art Sabiá model and that were originally developed with materials from Portuguese: ASSIN2 RTE (entailment) and ASSIN2 STS (similarity), BLUEX (question answering), ENEM 2022 (question answering) and FaQuAD (extractive question-answering).
 The scores of Sabiá invite to contrast them with Gervásio's but such comparison needs to be taken with some caution.
 - First, these are a repetition of the scores presented in the respective paper, which only provide results for a single run of each task, while scores of Gervásio are the average of three runs, with different seeds.
 | Model                    | ENEM 2022 (Accuracy) | BLUEX (Accuracy)| RTE (F1)  | STS (Pearson) |
 |--------------------------|----------------------|-----------------|-----------|---------------|
 | **Gervásio 7B PTBR**    | 0.1977               | 0.2640          | **0.7469**| **0.2136**    |
+| **LLaMA-2 (English)**              | 0.2458               | 0.2903          | 0.0913    | 0.1034        |
+| **LLaMA-2 Chat (English)**         | 0.2231               | 0.2959          | 0.5546    | 0.1750        |
 ||||||
 | **Sabiá-7B**             | **0.6017**           | **0.7743**      | 0.6847    | 0.1363        |