jarodrigues
commited on
Commit
•
2dc7d77
1
Parent(s):
fa515dc
Update README.md
Browse files
README.md
CHANGED
@@ -112,30 +112,18 @@ The model was trained for one day on a2-megagpu-16gb Google Cloud A2 VMs with 16
|
|
112 |
|
113 |
# Evaluation
|
114 |
|
115 |
-
The
|
116 |
|
117 |
|
118 |
## GLUE tasks translated
|
119 |
|
120 |
-
We resort to [PLUE](https://huggingface.co/datasets/dlb/plue) (Portuguese Language Understanding Evaluation), a data set that was obtained by automatically translating GLUE into **PT-BR**.
|
121 |
-
We address four tasks from those in PLUE, namely:
|
122 |
-
- two similarity tasks: MRPC, for detecting whether two sentences are paraphrases of each other, and STS-B, for semantic textual similarity;
|
123 |
-
- and two inference tasks: RTE, for recognizing textual entailment and WNLI, for coreference and natural language inference.
|
124 |
-
|
125 |
-
|
126 |
-
| Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) |
|
127 |
-
|--------------------------|----------------|----------------|-----------|-----------------|
|
128 |
-
| **Albertina-PT-BR Base** | 0.6462 | **0.5493** | 0.8779 | 0.8501 |
|
129 |
-
| **Albertina-PT-PT Base** | **0.6643** | 0.4366 | **0.8966** | **0.8608** |
|
130 |
-
|
131 |
|
132 |
We resorted to [GLUE-PT](https://huggingface.co/datasets/PORTULAN/glue-ptpt), a **PT-PT version of the GLUE** benchmark.
|
133 |
We automatically translated the same four tasks from GLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
|
134 |
|
135 |
| Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) |
|
136 |
|--------------------------|----------------|----------------|-----------|-----------------|
|
137 |
-
| **Albertina-PT-PT Base** |
|
138 |
-
| **Albertina-PT-BR Base** | 0.6570 | **0.5070** | **0.8900** | 0.8516 |
|
139 |
|
140 |
<br>
|
141 |
|
|
|
112 |
|
113 |
# Evaluation
|
114 |
|
115 |
+
The base model version was evaluated on downstream tasks, namely the translations into PT-PT of the English data sets used for a few of the tasks in the widely-used [GLUE benchmark](https://huggingface.co/datasets/glue).
|
116 |
|
117 |
|
118 |
## GLUE tasks translated
|
119 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
120 |
|
121 |
We resorted to [GLUE-PT](https://huggingface.co/datasets/PORTULAN/glue-ptpt), a **PT-PT version of the GLUE** benchmark.
|
122 |
We automatically translated the same four tasks from GLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
|
123 |
|
124 |
| Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) |
|
125 |
|--------------------------|----------------|----------------|-----------|-----------------|
|
126 |
+
| **Albertina-PT-PT Base** | 0.6787 | 0.4507 | 0.8829 | 0.8581 |
|
|
|
127 |
|
128 |
<br>
|
129 |
|