metadata

language:
  - ar
datasets:
  - mc4
  - oscar
  - arabic_billion_words

arabic-t5-small

This is a T5v1.1 (small) trained on the concatenation of the Arabic Billion Words corpus and the Arabic subsets of the mC4 and Oscar datasets. The model could only be trained for about 10% of the whole dataset due to time limitations.

Training parameters


steps	`22'000`
Training batch size	`384`
Evaluation batch size	`768`
learning rate	`1e-2`
dtype	`jnp.float32`