BOTS-LM
This collection hosts the models and datasets released as part of BOTS-LM, the Bilingual Open Tswana Suite of Language Models.
- Paper • 2408.02239 • Published • 1
OxxoCodes/Pula-8B-v0.1
Text Generation • Updated • 34 • 1Note The current (but not final) version of Pula-8B. Currently recommended for most use cases. Excels in translation and reasoning tasks.
OxxoCodes/Pula-XLMR-large-v0.1
Fill-Mask • Updated • 3Note The current (but not final) version of Pula-XLMR-Large. Currently recommended for most use cases.
OxxoCodes/Medupe
Viewer • Updated • 976k • 30Note Instruction-tuning dataset consisting of existing instruction datasets, datasets altered to a chat form, translation data, GPT-4o and Gemini-1.5-Pro translated subsets of popular instruction datasets, and purely synthetic instruction data.
OxxoCodes/Marothodi
Viewer • Updated • 152k • 31Note Contains raw webtext and other documents written in Setswana or code-switched Setswana+English
OxxoCodes/mmlu-tsn
Viewer • Updated • 14k • 47Note The entirety of the MMLU test split, translated into Setswana with GPT-4o
OxxoCodes/gsm8k-tsn
Viewer • Updated • 1.32k • 35Note The entirety of the GSM8K test split, translated into Setswana with Gemini 1.5 Pro.