BramVanroy commited on
Commit
0f00565
1 Parent(s): d63e1d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -9
README.md CHANGED
@@ -36,24 +36,22 @@ inference: false
36
  </p>
37
  </blockquote>
38
 
 
39
 
40
-
41
- This model is an adapted version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2), finetuned for Dutch text generation. It was continue-pretrained on 28B Dutch tokens, which includes the full Dutch component of Wikipedia (accounting for around 15%), supplemented with Dutch tokens from CulturaX. A newer version of this dataset can be found [here](https://huggingface.co/datasets/BramVanroy/wikipedia_culturax_dutch), which also describes the filtering that took place.
42
-
43
- ## Model description
44
-
45
- More information needed
46
 
47
  ## Intended uses & limitations
48
 
49
- More information needed
50
 
51
- ## Training and evaluation data
52
 
53
- More information needed
54
 
55
  ## Training procedure
56
 
 
 
57
  ### Training hyperparameters
58
 
59
  The following hyperparameters were used during training:
 
36
  </p>
37
  </blockquote>
38
 
39
+ Fietje is an adapated version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2), tailored to Dutch text generation by training on 28B tokens. It is small and efficient with a size of 2.7 billion parameters while performing almost on par with more powerful Dutch LLMs of twice its size like [GEITje 7B Ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra).
40
 
41
+ A thorough description of the creation and evaluation of Fietje as well as usage examples is available in [this Github repository](https://github.com/BramVanroy/fietje).
 
 
 
 
 
42
 
43
  ## Intended uses & limitations
44
 
45
+ The same limitations as [phi-2](https://huggingface.co/microsoft/phi-2#limitations-of-phi-2), and LLMs in general, apply here. LLMs hallucinate, make mistakes, and should not be trusted. Use at your own risk!
46
 
47
+ ## Training data
48
 
49
+ Fietje was continue-pretrained on 28B Dutch tokens, which includes the full Dutch component of Wikipedia (accounting for around 15%), supplemented with Dutch tokens from CulturaX. A newer version of this dataset can be found [here](https://huggingface.co/datasets/BramVanroy/wikipedia_culturax_dutch), which also describes the filtering that took place to ensure high data quality.
50
 
51
  ## Training procedure
52
 
53
+ I am thankful to the [Flemish Supercomputer Center](https://www.vscentrum.be/) (VSC) for providing the computational power to accomplish this project. Accounting for waiting for jobs, training took around two weeks on four nodes of 4x A100 80GB each (16 total).
54
+
55
  ### Training hyperparameters
56
 
57
  The following hyperparameters were used during training: