metadata
language:
- ar
datasets:
- mc4
- oscar
- arabic_billion_words
arabic-t5-small
This is a T5v1.1 (small) trained on the concatenation of the Arabic Billion Words corpus and the Arabic subsets of the mC4 and Oscar datasets. The model could only be trained for about 10%
of the whole dataset due to time limitations.
Training parameters
steps | 22'000 |
Training batch size | 384 |
Evaluation batch size | 768 |
learning rate | 1e-2 |
dtype | jnp.float32 |