BramVanroy
/

fietje-2

Text Generation

alignment-handbook

text-generation-inference

Model card Files Files and versions Community

BramVanroy commited on Apr 29

Commit

0f00565

•

1 Parent(s): d63e1d1

Update README.md

Files changed (1) hide show

README.md +7 -9

README.md CHANGED Viewed

@@ -36,24 +36,22 @@ inference: false
   </p>
 </blockquote>
-This model is an adapted version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2), finetuned for Dutch text generation. It was continue-pretrained on 28B Dutch tokens, which includes the full Dutch component of Wikipedia (accounting for around 15%), supplemented with Dutch tokens from CulturaX. A newer version of this dataset can be found [here](https://huggingface.co/datasets/BramVanroy/wikipedia_culturax_dutch), which also describes the filtering that took place.
-## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:

   </p>
 </blockquote>
+Fietje is an adapated version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2), tailored to Dutch text generation by training on 28B tokens. It is small and efficient with a size of 2.7 billion parameters while performing almost on par with more powerful Dutch LLMs of twice its size like [GEITje 7B Ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra).
+A thorough description of the creation and evaluation of Fietje as well as usage examples is available in [this Github repository](https://github.com/BramVanroy/fietje).
 ## Intended uses & limitations
+The same limitations as [phi-2](https://huggingface.co/microsoft/phi-2#limitations-of-phi-2), and LLMs in general, apply here. LLMs hallucinate, make mistakes, and should not be trusted. Use at your own risk!
+## Training data
+Fietje was continue-pretrained on 28B Dutch tokens, which includes the full Dutch component of Wikipedia (accounting for around 15%), supplemented with Dutch tokens from CulturaX. A newer version of this dataset can be found [here](https://huggingface.co/datasets/BramVanroy/wikipedia_culturax_dutch), which also describes the filtering that took place to ensure high data quality.
 ## Training procedure
+I am thankful to the [Flemish Supercomputer Center](https://www.vscentrum.be/) (VSC) for providing the computational power to accomplish this project. Accounting for waiting for jobs, training took around two weeks on four nodes of 4x A100 80GB each (16 total).
 ### Training hyperparameters
 The following hyperparameters were used during training: