jarodrigues
commited on
Commit
·
fa0de9d
1
Parent(s):
10dbd51
Update README.md
Browse files
README.md
CHANGED
@@ -11,6 +11,7 @@ datasets:
|
|
11 |
- oscar
|
12 |
- brwac
|
13 |
- europarl
|
|
|
14 |
widget:
|
15 |
- text: "A culinária portuguesa é rica em sabores e [MASK], tornando-se um dos maiores tesouros do país."
|
16 |
---
|
@@ -135,7 +136,7 @@ We address four tasks from those in PLUE, namely:
|
|
135 |
| **Albertina-PT-PT** | **0.7960** | 0.4507 | **0.9151**| 0.8799 |
|
136 |
|
137 |
|
138 |
-
We resorted to [GLUE-PT](https://huggingface.co/datasets/PORTULAN/
|
139 |
We automatically translated the same four tasks from GLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
|
140 |
|
141 |
| Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) |
|
@@ -172,7 +173,7 @@ The model can be used by fine-tuning it for a specific task:
|
|
172 |
|
173 |
>>> model = AutoModelForSequenceClassification.from_pretrained("PORTULAN/albertina-ptpt", num_labels=2)
|
174 |
>>> tokenizer = AutoTokenizer.from_pretrained("PORTULAN/albertina-ptpt")
|
175 |
-
>>> dataset = load_dataset("PORTULAN/
|
176 |
|
177 |
>>> def tokenize_function(examples):
|
178 |
... return tokenizer(examples["sentence1"], examples["sentence2"], padding="max_length", truncation=True)
|
|
|
11 |
- oscar
|
12 |
- brwac
|
13 |
- europarl
|
14 |
+
- glue-ptpt
|
15 |
widget:
|
16 |
- text: "A culinária portuguesa é rica em sabores e [MASK], tornando-se um dos maiores tesouros do país."
|
17 |
---
|
|
|
136 |
| **Albertina-PT-PT** | **0.7960** | 0.4507 | **0.9151**| 0.8799 |
|
137 |
|
138 |
|
139 |
+
We resorted to [GLUE-PT](https://huggingface.co/datasets/PORTULAN/glue-ptpt), a **PT-PT version of the GLUE** benchmark.
|
140 |
We automatically translated the same four tasks from GLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
|
141 |
|
142 |
| Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) |
|
|
|
173 |
|
174 |
>>> model = AutoModelForSequenceClassification.from_pretrained("PORTULAN/albertina-ptpt", num_labels=2)
|
175 |
>>> tokenizer = AutoTokenizer.from_pretrained("PORTULAN/albertina-ptpt")
|
176 |
+
>>> dataset = load_dataset("PORTULAN/glue-ptpt", "rte")
|
177 |
|
178 |
>>> def tokenize_function(examples):
|
179 |
... return tokenizer(examples["sentence1"], examples["sentence2"], padding="max_length", truncation=True)
|