metadata
license: mit
language: de
widget:
- text: >-
In einer schockierenden Entdeckung fanden Wissenschaftler eine Herde
Einhörner, die in
German GPT2-XL (1.5B)
- trained with BigScience's DeepSpeed-Megatron-LM code base
- word embedding initialized with WECHSEL and all other weights taken from English gpt2-xl
- ~ 3 days on 16xA100 GPUs (~ 80 TFLOPs / GPU)
- stopped after 100k steps
- 26.2B tokens
- less than a single epoch on
oscar_unshuffled_deduplicated_de
(excluding validation set; original model was trained for 75 epochs on less data) - bf16
- zero stage 0
- tp/pp = 1
Evaluation
Model (size) | PPL |
---|---|
gpt2-xl-wechsel-german (1.5B) |
14.5 |
gpt2-wechsel-german-ds-meg (117M) |
26.4 |
gpt2-wechsel-german (117M) |
26.8 |
gpt2 (retrained from scratch) (117M) |
27.63 |
License
MIT