orai-nlp/bert-medium-sw · Hugging Face

BERT medium (cased) model trained on a subset of 125M tokens of cc100-Swahili for our work Scaling Laws for BERT in Low-Resource Settings at ACL2023 Findings.

The model has 51M parameters (8L), and a vocab size of 50K. It was trained for 500K steps with a sequence length of 512 tokens and batch-size of 256.

Results

	bert-base-sw	bert-medium-sw	Flair	mBERT	SwahBERT
NERC	92.09	91.63	92.04	91.17	88.60
Topic	93.07	92.88	91.83	91.52	90.90
Sentiment	79.04	77.07	73.60	69.17	71.12
QNLI	63.34	63.87	52.82	63.48	64.72

Gorka Urbizu [1], Iñaki San Vicente [1], Xabier Saralegi [1], Rodrigo Agerri [2] and Aitor Soroa [2]

Affiliation of the authors:

[1] Orai NLP Technologies

[2] HiTZ Center - Ixa, University of the Basque Country UPV/EHU

The model is licensed under the Creative Commons Attribution 4.0. International License (CC BY 4.0).

If you use this model please cite the following paper:

G. Urbizu, I. San Vicente, X. Saralegi, R. Agerri, A. Soroa. Scaling Laws for BERT in Low-Resource Settings. Findings of the Association for Computational Linguistics: ACL 2023. July, 2023. Toronto, Canada

Gorka Urbizu, Iñaki San Vicente: {g.urbizu,i.sanvicente}@orai.eus