jarodrigues
commited on
Commit
·
450aa46
1
Parent(s):
8716635
Update README.md
Browse files
README.md
CHANGED
@@ -14,8 +14,6 @@ tags:
|
|
14 |
- bertimbau
|
15 |
license: mit
|
16 |
datasets:
|
17 |
-
- oscar
|
18 |
-
- brwac
|
19 |
- europarl_bilingual
|
20 |
- PORTULAN/glue-ptpt
|
21 |
- PORTULAN/parlamento-pt
|
@@ -92,7 +90,7 @@ This model is distributed free of charge under the [MIT](https://choosealicense.
|
|
92 |
- [ParlamentoPT](https://www.parlamento.pt/): the ParlamentoPT is a data set we obtained by gathering the publicly available documents with the transcription of the debates in the Portuguese Parliament.
|
93 |
|
94 |
|
95 |
-
**Albertina PT-BR
|
96 |
|
97 |
|
98 |
## Preprocessing
|
@@ -111,7 +109,7 @@ Similarly to the PT-BR variant above, we opted for a learning rate of 1e-5 with
|
|
111 |
However, since the number of training examples is approximately twice of that in the PT-BR variant, we reduced the number of training epochs to half and completed only 25 epochs, which resulted in approximately 245k steps.
|
112 |
The model was trained for 3 days on a2-highgpu-8gb Google Cloud A2 VMs with 8 GPUs, 96 vCPUs and 680 GB of RAM.
|
113 |
|
114 |
-
To train **Albertina
|
115 |
The model was trained using the maximum available memory capacity resulting in a batch size of 896 samples (56 samples per GPU without gradient accumulation steps).
|
116 |
We chose a learning rate of 1e-5 with linear decay and 10k warm-up steps based on the results of exploratory experiments.
|
117 |
In total, around 200k training steps were taken across 50 epochs.
|
|
|
14 |
- bertimbau
|
15 |
license: mit
|
16 |
datasets:
|
|
|
|
|
17 |
- europarl_bilingual
|
18 |
- PORTULAN/glue-ptpt
|
19 |
- PORTULAN/parlamento-pt
|
|
|
90 |
- [ParlamentoPT](https://www.parlamento.pt/): the ParlamentoPT is a data set we obtained by gathering the publicly available documents with the transcription of the debates in the Portuguese Parliament.
|
91 |
|
92 |
|
93 |
+
[**Albertina PT-BR**](https://huggingface.co/PORTULAN/albertina-ptbr), in turn, was trained over the [BrWac](https://huggingface.co/datasets/brwac) data set.
|
94 |
|
95 |
|
96 |
## Preprocessing
|
|
|
109 |
However, since the number of training examples is approximately twice of that in the PT-BR variant, we reduced the number of training epochs to half and completed only 25 epochs, which resulted in approximately 245k steps.
|
110 |
The model was trained for 3 days on a2-highgpu-8gb Google Cloud A2 VMs with 8 GPUs, 96 vCPUs and 680 GB of RAM.
|
111 |
|
112 |
+
To train [**Albertina PT-BR**](https://huggingface.co/PORTULAN/albertina-ptbr) the BrWac data set was tokenized with the original DeBERTA tokenizer with a 128 token sequence truncation and dynamic padding.
|
113 |
The model was trained using the maximum available memory capacity resulting in a batch size of 896 samples (56 samples per GPU without gradient accumulation steps).
|
114 |
We chose a learning rate of 1e-5 with linear decay and 10k warm-up steps based on the results of exploratory experiments.
|
115 |
In total, around 200k training steps were taken across 50 epochs.
|