arabic-t5-small / README.md
salti's picture
Update README.md
1d0695f
|
raw
history blame
654 Bytes
---
language:
- ar
datasets:
- mc4
- oscar
- arabic_billion_words
---
# arabic-t5-small
This is a T5v1.1 (small) trained on the concatenation of the Arabic Billion Words corpus and the Arabic subsets of the mC4 and Oscar datasets. The model could only be trained for about `10%` of the whole dataset due to time limitations.
## Training parameters
| | |
| :-------------------: | :-----------: |
| steps | `22'000` |
| Training batch size | `384` |
| Evaluation batch size | `768` |
| learning rate | `1e-2` |
| dtype | `jnp.float32` |