jarodrigues commited on
Commit
fa0de9d
·
1 Parent(s): 10dbd51

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -11,6 +11,7 @@ datasets:
11
  - oscar
12
  - brwac
13
  - europarl
 
14
  widget:
15
  - text: "A culinária portuguesa é rica em sabores e [MASK], tornando-se um dos maiores tesouros do país."
16
  ---
@@ -135,7 +136,7 @@ We address four tasks from those in PLUE, namely:
135
  | **Albertina-PT-PT** | **0.7960** | 0.4507 | **0.9151**| 0.8799 |
136
 
137
 
138
- We resorted to [GLUE-PT](https://huggingface.co/datasets/PORTULAN/glueptpt), a **PT-PT version of the GLUE** benchmark.
139
  We automatically translated the same four tasks from GLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
140
 
141
  | Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) |
@@ -172,7 +173,7 @@ The model can be used by fine-tuning it for a specific task:
172
 
173
  >>> model = AutoModelForSequenceClassification.from_pretrained("PORTULAN/albertina-ptpt", num_labels=2)
174
  >>> tokenizer = AutoTokenizer.from_pretrained("PORTULAN/albertina-ptpt")
175
- >>> dataset = load_dataset("PORTULAN/glueptpt", "rte")
176
 
177
  >>> def tokenize_function(examples):
178
  ... return tokenizer(examples["sentence1"], examples["sentence2"], padding="max_length", truncation=True)
 
11
  - oscar
12
  - brwac
13
  - europarl
14
+ - glue-ptpt
15
  widget:
16
  - text: "A culinária portuguesa é rica em sabores e [MASK], tornando-se um dos maiores tesouros do país."
17
  ---
 
136
  | **Albertina-PT-PT** | **0.7960** | 0.4507 | **0.9151**| 0.8799 |
137
 
138
 
139
+ We resorted to [GLUE-PT](https://huggingface.co/datasets/PORTULAN/glue-ptpt), a **PT-PT version of the GLUE** benchmark.
140
  We automatically translated the same four tasks from GLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
141
 
142
  | Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) |
 
173
 
174
  >>> model = AutoModelForSequenceClassification.from_pretrained("PORTULAN/albertina-ptpt", num_labels=2)
175
  >>> tokenizer = AutoTokenizer.from_pretrained("PORTULAN/albertina-ptpt")
176
+ >>> dataset = load_dataset("PORTULAN/glue-ptpt", "rte")
177
 
178
  >>> def tokenize_function(examples):
179
  ... return tokenizer(examples["sentence1"], examples["sentence2"], padding="max_length", truncation=True)