Fill-Mask
Transformers
PyTorch
Portuguese
deberta-v2
albertina-pt*
albertina-100m-portuguese-ptpt
albertina-100m-portuguese-ptbr
albertina-900m-portuguese-ptpt
albertina-900m-portuguese-ptbr
albertina-1b5-portuguese-ptpt
albertina-1b5-portuguese-ptbr
bert
deberta
portuguese
encoder
foundation model
Inference Endpoints
jarodrigues
commited on
Commit
•
19ecde1
1
Parent(s):
e9a627b
Update README.md
Browse files
README.md
CHANGED
@@ -115,8 +115,8 @@ As codebase, we resorted to the [DeBERTa V2 xxlarge](https://huggingface.co/micr
|
|
115 |
|
116 |
To train **Albertina 1.5B PTBR 256**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence
|
117 |
truncation and dynamic padding for 250k steps and a 256-token sequence-truncation for 80k steps.
|
118 |
-
These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences
|
119 |
-
input sequences
|
120 |
We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
|
121 |
|
122 |
<br>
|
|
|
115 |
|
116 |
To train **Albertina 1.5B PTBR 256**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence
|
117 |
truncation and dynamic padding for 250k steps and a 256-token sequence-truncation for 80k steps.
|
118 |
+
These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences and 24 hours of computation for the 256-token
|
119 |
+
input sequences.
|
120 |
We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
|
121 |
|
122 |
<br>
|