File size: 4,694 Bytes
74f3fb9
 
 
 
 
d416e4c
 
74f3fb9
 
d416e4c
 
 
74f3fb9
d416e4c
74f3fb9
d416e4c
 
 
 
74f3fb9
 
d416e4c
cf9b433
d416e4c
 
 
 
 
 
 
a09b906
1eb85cd
d416e4c
 
 
3d3a57f
d416e4c
a09b906
 
 
d416e4c
74f3fb9
48f8fdc
74f3fb9
44e5560
74f3fb9
 
 
e176936
74f3fb9
 
 
fd0f283
e176936
 
 
 
74f3fb9
 
 
048a29d
 
 
 
 
74f3fb9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d416e4c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
license: mit
base_model: BramVanroy/fietje-2b
tags:
- trl
- fietje
- alignment-handbook
- sft
datasets:
- BramVanroy/ultrachat_200k_dutch
- BramVanroy/no_robots_dutch
- BramVanroy/belebele_dutch
model-index:
- name: fietje-2b-instruct
  results: []
pipeline_tag: text-generation
inference: false
language:
- nl
---

<p align="center" style="margin:0;padding:0">
  <img src="https://huggingface.co/BramVanroy/fietje-2b-instruct/resolve/main/img/fietje-2b-banner-rounded.png" alt="Fietje banner" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
</p>

<div style="margin:auto; text-align:center">
  <h1 style="margin-bottom: 0">Fietje 2B Instruct</h1>
  <em>An open and efficient LLM for Dutch</em>
</div>

<blockquote class="tip" style="padding: 1.5em; border: 0">
  <p align="center" style="text-align: center; margin: 0">
    <a rel="nofollow" href="https://huggingface.co/BramVanroy/fietje-2b">πŸ‘±β€β™€οΈ Base version</a> -
    <a rel="nofollow" href="https://huggingface.co/BramVanroy/fietje-2b-instruct">πŸ€– Instruct version</a> (this one) -
    <a rel="nofollow" href="https://huggingface.co/BramVanroy/fietje-2b-chat">πŸ’¬ Chat version</a> -
    <a rel="nofollow" href="https://huggingface.co/BramVanroy/fietje-2b-chat-GGUF">πŸš€ GGUF of Instruct</a>
  </p>
  <p align="center" style="text-align: center; margin: 0">
    <a href="https://huggingface.co/spaces/BramVanroy/fietje-2b"><strong>Chat with Fietje here!</strong></a>
  </p>
</blockquote>

This is the instruct version of Fietje, an SFT-tuned (instruction-tuned) variant of [the base model](https://huggingface.co/BramVanroy/fietje-2b). Fietje is an adapated version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2), tailored to Dutch text generation by training on 28B tokens. It is small and efficient with a size of 2.7 billion parameters while performing almost on par with more powerful Dutch LLMs of twice its size like [GEITje 7B Ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra).

A thorough description of the creation and evaluation of Fietje as well as usage examples are available in [this Github repository](https://github.com/BramVanroy/fietje).

## Intended uses & limitations

The same limitations as [phi-2](https://huggingface.co/microsoft/phi-2#limitations-of-phi-2), and LLMs in general, apply here. LLMs hallucinate, make mistakes, and should not be trusted. Use at your own risk!

## Training and evaluation data

Fietje 2B instruct was finetuned from [the base model](https://huggingface.co/BramVanroy/fietje-2b) on the following datasets. Number of training samples per dataset given in brackets, totalling 201,579 samples.

- [BramVanroy/ultrachat_200k_dutch](https://huggingface.co/datasets/BramVanroy/ultrachat_200k_dutch): gpt-4-1106-preview; multi-turn; fully generated (192,598)
- [BramVanroy/no_robots_dutch](https://huggingface.co/datasets/BramVanroy/no_robots_dutch): gpt-4-1106-preview; prompt translate, answer generated; some items have system messages (8181)
- [BramVanroy/belebele_dutch](https://huggingface.co/datasets/BramVanroy/belebele_dutch): Dutch portion of [belebele](https://huggingface.co/datasets/facebook/belebele), formatted into SFT format (800)

## Training procedure

I am thankful to the [Flemish Supercomputer Center](https://www.vscentrum.be/) (VSC) for providing the computational power to accomplish this project. Accounting for waiting for jobs, training took around a day on four nodes of 4x A100 80GB each (16 total). I cannot find the exact time anymore and I do not think that the runtime in `all_results.json` accounts for interrupted-and-continued jobs.

Training was done with the wonderful [alignment-handbook](https://github.com/huggingface/alignment-handbook), using DeepSpeed as a back-end. Exact training recipes and SLURM script are given in the [Github repository](https://github.com/BramVanroy/fietje).


### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 6e-05
- train_batch_size: 42
- eval_batch_size: 42
- seed: 42
- distributed_type: multi-GPU
- num_devices: 16
- total_train_batch_size: 672
- total_eval_batch_size: 672
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3.0

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.9325        | 1.0   | 178  | 0.9060          |
| 0.8687        | 2.0   | 356  | 0.8850          |
| 0.8385        | 3.0   | 534  | 0.8818          |


### Framework versions

- Transformers 4.39.1
- Pytorch 2.1.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2