BERT medium (cased) model trained on a subset of 125M tokens of cc100-Swahili for our work Scaling Laws for BERT in Low-Resource Settings at ACL2023 Findings.
The model has 51M parameters (8L), and a vocab size of 50K. It was trained for 500K steps with a sequence length of 512 tokens and batch-size of 256.
Results
bert-base-sw | bert-medium-sw | Flair | mBERT | SwahBERT | |
---|---|---|---|---|---|
NERC | 92.09 | 91.63 | 92.04 | 91.17 | 88.60 |
Topic | 93.07 | 92.88 | 91.83 | 91.52 | 90.90 |
Sentiment | 79.04 | 77.07 | 73.60 | 69.17 | 71.12 |
QNLI | 63.34 | 63.87 | 52.82 | 63.48 | 64.72 |
Authors
Gorka Urbizu [1], Iñaki San Vicente [1], Xabier Saralegi [1], Rodrigo Agerri [2] and Aitor Soroa [2]
Affiliation of the authors:
[1] Orai NLP Technologies
[2] HiTZ Center - Ixa, University of the Basque Country UPV/EHU
Licensing
The model is licensed under the Creative Commons Attribution 4.0. International License (CC BY 4.0).
To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
Acknowledgements
If you use this model please cite the following paper:
- G. Urbizu, I. San Vicente, X. Saralegi, R. Agerri, A. Soroa. Scaling Laws for BERT in Low-Resource Settings. Findings of the Association for Computational Linguistics: ACL 2023. July, 2023. Toronto, Canada
Contact information
Gorka Urbizu, Iñaki San Vicente: {g.urbizu,i.sanvicente}@orai.eus
- Downloads last month
- 3