jarodrigues commited on
Commit
2dc7d77
1 Parent(s): fa515dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -14
README.md CHANGED
@@ -112,30 +112,18 @@ The model was trained for one day on a2-megagpu-16gb Google Cloud A2 VMs with 16
112
 
113
  # Evaluation
114
 
115
- The two base model versions were evaluated on downstream tasks, namely the translations into PT-BR and PT-PT of the English data sets used for a few of the tasks in the widely-used [GLUE benchmark](https://huggingface.co/datasets/glue), which allowed us to test both Albertina-PT-* Base variants on a wider variety of downstream tasks.
116
 
117
 
118
  ## GLUE tasks translated
119
 
120
- We resort to [PLUE](https://huggingface.co/datasets/dlb/plue) (Portuguese Language Understanding Evaluation), a data set that was obtained by automatically translating GLUE into **PT-BR**.
121
- We address four tasks from those in PLUE, namely:
122
- - two similarity tasks: MRPC, for detecting whether two sentences are paraphrases of each other, and STS-B, for semantic textual similarity;
123
- - and two inference tasks: RTE, for recognizing textual entailment and WNLI, for coreference and natural language inference.
124
-
125
-
126
- | Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) |
127
- |--------------------------|----------------|----------------|-----------|-----------------|
128
- | **Albertina-PT-BR Base** | 0.6462 | **0.5493** | 0.8779 | 0.8501 |
129
- | **Albertina-PT-PT Base** | **0.6643** | 0.4366 | **0.8966** | **0.8608** |
130
-
131
 
132
  We resorted to [GLUE-PT](https://huggingface.co/datasets/PORTULAN/glue-ptpt), a **PT-PT version of the GLUE** benchmark.
133
  We automatically translated the same four tasks from GLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
134
 
135
  | Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) |
136
  |--------------------------|----------------|----------------|-----------|-----------------|
137
- | **Albertina-PT-PT Base** | **0.6787** | 0.4507 | 0.8829 | **0.8581** |
138
- | **Albertina-PT-BR Base** | 0.6570 | **0.5070** | **0.8900** | 0.8516 |
139
 
140
  <br>
141
 
 
112
 
113
  # Evaluation
114
 
115
+ The base model version was evaluated on downstream tasks, namely the translations into PT-PT of the English data sets used for a few of the tasks in the widely-used [GLUE benchmark](https://huggingface.co/datasets/glue).
116
 
117
 
118
  ## GLUE tasks translated
119
 
 
 
 
 
 
 
 
 
 
 
 
120
 
121
  We resorted to [GLUE-PT](https://huggingface.co/datasets/PORTULAN/glue-ptpt), a **PT-PT version of the GLUE** benchmark.
122
  We automatically translated the same four tasks from GLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
123
 
124
  | Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) |
125
  |--------------------------|----------------|----------------|-----------|-----------------|
126
+ | **Albertina-PT-PT Base** | 0.6787 | 0.4507 | 0.8829 | 0.8581 |
 
127
 
128
  <br>
129