RoBERTa Pretrained for Tigrinya Language

We pretrain a RoBERTa Base model on a relatively small dataset for Tigrinya (34M tokens) for 18 epochs.

Contained in this card is a PyTorch model exported from the original model that was trained on TPU v3.8 with Flax.

Hyperparameters

The hyperparameters corresponding to model sizes mentioned above are as follows:

Model Size	L	AH	HS	FFN	P
BASE	12	12	768	3072	125M

(L = number of layers; AH = number of attention heads; HS = hidden size; FFN = feedforward network dimension; P = number of parameters.)