malteos
/

gpt2-xl-wechsel-german

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

malteos commited on May 11, 2022

Commit

d755b32

•

1 Parent(s): 2f64fbf

Update README.md

Files changed (1) hide show

README.md +30 -0

README.md CHANGED Viewed

@@ -1,3 +1,33 @@
 ---
 license: mit
 ---

 ---
 license: mit
+language: de
+widget:
+- text: "In einer schockierenden Entdeckung fanden Wissenschaftler eine Herde Einhörner, die in "
 ---
+# German GPT2-XL (1.5B)
+- trained with [BigScience's DeepSpeed-Megatron-LM code base](https://github.com/bigscience-workshop/Megatron-DeepSpeed)
+- word embedding initialized with [WECHSEL](https://arxiv.org/abs/2112.06598) and all other weights taken from English [gpt2-xl](https://huggingface.co/gpt2-xl)
+- ~ 3 days on 16xA100 GPUs (~ 80 TFLOPs / GPU)
+- stopped after 100k steps
+- 26.2B tokens
+- less than a single epoch on `oscar_unshuffled_deduplicated_de` (excluding validation set; original model was trained for 75 epochs on less data)
+- bf16
+- zero stage 0
+- tp/pp = 1
+## Evaluation
+| Model (size) | PPL |
+|---|---|
+| `gpt2-xl-wechsel-german` (1.5B) | **14.5** |
+| `gpt2-wechsel-german-ds-meg` (117M) | 26.4 |
+| `gpt2-wechsel-german`  (117M) | 26.8 |
+| `gpt2` (retrained from scratch) (117M)  | 27.63 |
+## License
+MIT