PORTULAN
/

albertina-900m-portuguese-ptpt-encoder

foundation model

Inference Endpoints

Model card Files Files and versions Community

jarodrigues commited on May 9, 2023

Commit

fa0de9d

·

1 Parent(s): 10dbd51

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -11,6 +11,7 @@ datasets:
 - oscar
 - brwac
 - europarl
 widget:
  - text: "A culinária portuguesa é rica em sabores e [MASK], tornando-se um dos maiores tesouros do país."
 ---
@@ -135,7 +136,7 @@ We address four tasks from those in PLUE, namely:
 | **Albertina-PT-PT** | **0.7960**     | 0.4507         | **0.9151**| 0.8799          |
-We resorted to [GLUE-PT](https://huggingface.co/datasets/PORTULAN/glueptpt), a **PT-PT version of the GLUE** benchmark.
 We automatically translated the same four tasks from GLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
 | Model               | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) |
@@ -172,7 +173,7 @@ The model can be used by fine-tuning it for a specific task:
 >>> model = AutoModelForSequenceClassification.from_pretrained("PORTULAN/albertina-ptpt", num_labels=2)
 >>> tokenizer = AutoTokenizer.from_pretrained("PORTULAN/albertina-ptpt")
->>> dataset = load_dataset("PORTULAN/glueptpt", "rte")
 >>> def tokenize_function(examples):
 ...     return tokenizer(examples["sentence1"], examples["sentence2"], padding="max_length", truncation=True)

 - oscar
 - brwac
 - europarl
+- glue-ptpt
 widget:
  - text: "A culinária portuguesa é rica em sabores e [MASK], tornando-se um dos maiores tesouros do país."
 ---
 | **Albertina-PT-PT** | **0.7960**     | 0.4507         | **0.9151**| 0.8799          |
+We resorted to [GLUE-PT](https://huggingface.co/datasets/PORTULAN/glue-ptpt), a **PT-PT version of the GLUE** benchmark.
 We automatically translated the same four tasks from GLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
 | Model               | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) |
 >>> model = AutoModelForSequenceClassification.from_pretrained("PORTULAN/albertina-ptpt", num_labels=2)
 >>> tokenizer = AutoTokenizer.from_pretrained("PORTULAN/albertina-ptpt")
+>>> dataset = load_dataset("PORTULAN/glue-ptpt", "rte")
 >>> def tokenize_function(examples):
 ...     return tokenizer(examples["sentence1"], examples["sentence2"], padding="max_length", truncation=True)