jarodrigues
commited on
Commit
•
7c17f88
1
Parent(s):
3c821e4
Update README.md
Browse files
README.md
CHANGED
@@ -128,6 +128,31 @@ You can use this model directly with a pipeline for masked language modeling:
|
|
128 |
|
129 |
```
|
130 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
131 |
|
132 |
# Citation
|
133 |
|
|
|
128 |
|
129 |
```
|
130 |
|
131 |
+
The model can be used by fine-tuning it for a specific task:
|
132 |
+
|
133 |
+
```python
|
134 |
+
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
|
135 |
+
>>> from datasets import load_dataset
|
136 |
+
|
137 |
+
|
138 |
+
>>> model = AutoModelForSequenceClassification.from_pretrained("PORTULAN/albertina-pt-pt", num_labels=2)
|
139 |
+
>>> tokenizer = AutoTokenizer.from_pretrained("PORTULAN/albertina-pt-pt")
|
140 |
+
>>> dataset = load_dataset("PORTULAN/glueptpt", "rte")
|
141 |
+
|
142 |
+
>>> def tokenize_function(examples):
|
143 |
+
... return tokenizer(examples["text"], padding="max_length", truncation=True)
|
144 |
+
>>> tokenized_datasets = dataset.map(tokenize_function, batched=True)
|
145 |
+
|
146 |
+
>>> training_args = TrainingArguments(output_dir="albertina-pt-pt-rte", evaluation_strategy="epoch")
|
147 |
+
>>> trainer = Trainer(
|
148 |
+
... model=model,
|
149 |
+
... args=training_args,
|
150 |
+
... train_dataset=tokenized_datasets["train"],
|
151 |
+
... eval_dataset=tokenized_datasets["validation"],
|
152 |
+
... )
|
153 |
+
>>> trainer.train()
|
154 |
+
|
155 |
+
```
|
156 |
|
157 |
# Citation
|
158 |
|