fietje-2 / README.md
BramVanroy's picture
Update README.md
0dea749 verified
|
raw
history blame
2.44 kB
metadata
license: mit
base_model: microsoft/phi-2
tags:
  - trl
  - conversational
  - fietje
  - alignment-handbook
datasets:
  - uonlp/CulturaX
  - wikimedia/wikipedia
model-index:
  - name: fietje-2b
    results: []
language:
  - nl
pipeline_tag: text-generation

Fietje banner

Fietje 2B

An open and efficient LLM for Dutch.

๐Ÿš€ Looking for the fast GGUF version? You can find it, and how to use it with ollama, here. ๐Ÿš€

This model is an adapted version of microsoft/phi-2, finetuned for Dutch text generation. It was continue-pretrained on 28B Dutch tokens, which includes the full Dutch component of Wikipedia and supplemented with Dutch tokens from CulturaX. A newer version of this dataset can be found here, which also describes the filtering that took place.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 9e-05
  • train_batch_size: 40
  • eval_batch_size: 40
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 16
  • gradient_accumulation_steps: 3
  • total_train_batch_size: 1920
  • total_eval_batch_size: 640
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss
1.6334 0.13 900 1.5937
1.5469 0.26 1800 1.5051
1.4937 0.4 2700 1.4628
1.4633 0.53 3600 1.4375
1.4485 0.66 4500 1.4203
1.4374 0.79 5400 1.4085
1.4278 0.92 6300 1.4013

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2