Lugha-Llama
/

Lugha-Llama-8B-wura

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Happyb commited on 24 days ago

Commit

2155474

·

verified ·

1 Parent(s): 8cb3929

Update README.md

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -11,8 +11,12 @@ Lugha-Llama is an Africa-centric language model developed through continual pret
 languages commonly spoken on the African continent.
 To train the model, we sample as uniformly as possible across languages while limiting the number of times data is repeated and upsample rare languages by at most four epochs.
-We combine [WURA data](https://huggingface.co/datasets/castorini/wura) with high-quality English documents from [FineWeb-Edu](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1) and [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) which results in improved Lugha-Llama-Edu and Lugha-Llama-Maths models respectively.
-On the challenging [IrokoBench](https://huggingface.co/collections/masakhane/irokobench-665a21b6d4714ed3f81af3b1) dataset, our models consistently achieve the best performance amongst similary-sized baselines. In a separate ablation experiment, we translate English education documents to Swahili to study whether the performance gains from FineWeb-Edu data is due to its content or English source language.
 We demonstrate the findings in our paper [Adapting Large Language Models for African Languages:
 The Lugha-Llama Model]()
@@ -22,9 +26,6 @@ Authors: [Happy Buzaaba](https://buzaabah.github.io/)\*, [Alexander Wettig](http
 contact {happy.buzaaba@, awettig@cs}princeton.edu
-* Translated Swahili data 200M tokens: [FineWeb_Edu-swahili-translated](https://huggingface.co/datasets/princeton-nlp/fineweb_edu-swahili-translated)
 ## Lugha-Llama models
 * [Lugha-Llama/Lugha-Llama-8B-wura](https://huggingface.co/Lugha-Llama/Lugha-Llama-8B-wura)
@@ -35,3 +36,4 @@ contact {happy.buzaaba@, awettig@cs}princeton.edu

 languages commonly spoken on the African continent.
 To train the model, we sample as uniformly as possible across languages while limiting the number of times data is repeated and upsample rare languages by at most four epochs.
+We combine [WURA data](https://huggingface.co/datasets/castorini/wura) with high-quality English documents from [FineWeb-Edu](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1) and [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) which results into improved Lugha-Llama-Edu and Lugha-Llama-Maths models respectively.
+Our models consistently achieve the best performance amongst similary-sized baselines.
+In a separate ablation experiment, we translate English education documents to Swahili to study whether the performance gains from FineWeb-Edu data is due to its content or English source language.
+* Translated Swahili data 200M tokens: [FineWeb_Edu-swahili-translated](https://huggingface.co/datasets/princeton-nlp/fineweb_edu-swahili-translated)
 We demonstrate the findings in our paper [Adapting Large Language Models for African Languages:
 The Lugha-Llama Model]()
 contact {happy.buzaaba@, awettig@cs}princeton.edu
 ## Lugha-Llama models
 * [Lugha-Llama/Lugha-Llama-8B-wura](https://huggingface.co/Lugha-Llama/Lugha-Llama-8B-wura)