Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,33 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
language: de
|
4 |
+
widget:
|
5 |
+
- text: "In einer schockierenden Entdeckung fanden Wissenschaftler eine Herde Einhörner, die in "
|
6 |
---
|
7 |
+
|
8 |
+
# German GPT2-XL (1.5B)
|
9 |
+
|
10 |
+
- trained with [BigScience's DeepSpeed-Megatron-LM code base](https://github.com/bigscience-workshop/Megatron-DeepSpeed)
|
11 |
+
- word embedding initialized with [WECHSEL](https://arxiv.org/abs/2112.06598) and all other weights taken from English [gpt2-xl](https://huggingface.co/gpt2-xl)
|
12 |
+
- ~ 3 days on 16xA100 GPUs (~ 80 TFLOPs / GPU)
|
13 |
+
- stopped after 100k steps
|
14 |
+
- 26.2B tokens
|
15 |
+
- less than a single epoch on `oscar_unshuffled_deduplicated_de` (excluding validation set; original model was trained for 75 epochs on less data)
|
16 |
+
- bf16
|
17 |
+
- zero stage 0
|
18 |
+
- tp/pp = 1
|
19 |
+
|
20 |
+
## Evaluation
|
21 |
+
|
22 |
+
| Model (size) | PPL |
|
23 |
+
|---|---|
|
24 |
+
| `gpt2-xl-wechsel-german` (1.5B) | **14.5** |
|
25 |
+
| `gpt2-wechsel-german-ds-meg` (117M) | 26.4 |
|
26 |
+
| `gpt2-wechsel-german` (117M) | 26.8 |
|
27 |
+
| `gpt2` (retrained from scratch) (117M) | 27.63 |
|
28 |
+
|
29 |
+
## License
|
30 |
+
|
31 |
+
MIT
|
32 |
+
|
33 |
+
|