tiroberta-base / README.md
fgaim's picture
Update README
56143a6
|
raw
history blame
643 Bytes

RoBERTa Pretrained for Tigrinya Language

We pretrain a RoBERTa Base model on a relatively small dataset for Tigrinya (34M tokens) for 18 epochs.

Contained in this card is a PyTorch model exported from the original model that was trained on TPU v3.8 with Flax.

Hyperparameters

The hyperparameters corresponding to model sizes mentioned above are as follows:

Model Size L AH HS FFN P
BASE 12 12 768 3072 125M

(L = number of layers; AH = number of attention heads; HS = hidden size; FFN = feedforward network dimension; P = number of parameters.)