--- license: mit language: de widget: - text: "In einer schockierenden Entdeckung fanden Wissenschaftler eine Herde Einhörner, die in " --- # German GPT2-XL (1.5B) - trained with [BigScience's DeepSpeed-Megatron-LM code base](https://github.com/bigscience-workshop/Megatron-DeepSpeed) - word embedding initialized with [WECHSEL](https://arxiv.org/abs/2112.06598) and all other weights taken from English [gpt2-xl](https://huggingface.co/gpt2-xl) - ~ 3 days on 16xA100 GPUs (~ 80 TFLOPs / GPU) - stopped after 100k steps - 26.2B tokens - less than a single epoch on `oscar_unshuffled_deduplicated_de` (excluding validation set; original model was trained for 75 epochs on less data) - bf16 - zero stage 0 - tp/pp = 1 ## Evaluation | Model (size) | PPL | |---|---| | `gpt2-xl-wechsel-german` (1.5B) | **14.5** | | `gpt2-wechsel-german-ds-meg` (117M) | 26.4 | | `gpt2-wechsel-german` (117M) | 26.8 | | `gpt2` (retrained from scratch) (117M) | 27.63 | ## License MIT