malteos commited on
Commit
d755b32
1 Parent(s): 2f64fbf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md CHANGED
@@ -1,3 +1,33 @@
1
  ---
2
  license: mit
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language: de
4
+ widget:
5
+ - text: "In einer schockierenden Entdeckung fanden Wissenschaftler eine Herde Einhörner, die in "
6
  ---
7
+
8
+ # German GPT2-XL (1.5B)
9
+
10
+ - trained with [BigScience's DeepSpeed-Megatron-LM code base](https://github.com/bigscience-workshop/Megatron-DeepSpeed)
11
+ - word embedding initialized with [WECHSEL](https://arxiv.org/abs/2112.06598) and all other weights taken from English [gpt2-xl](https://huggingface.co/gpt2-xl)
12
+ - ~ 3 days on 16xA100 GPUs (~ 80 TFLOPs / GPU)
13
+ - stopped after 100k steps
14
+ - 26.2B tokens
15
+ - less than a single epoch on `oscar_unshuffled_deduplicated_de` (excluding validation set; original model was trained for 75 epochs on less data)
16
+ - bf16
17
+ - zero stage 0
18
+ - tp/pp = 1
19
+
20
+ ## Evaluation
21
+
22
+ | Model (size) | PPL |
23
+ |---|---|
24
+ | `gpt2-xl-wechsel-german` (1.5B) | **14.5** |
25
+ | `gpt2-wechsel-german-ds-meg` (117M) | 26.4 |
26
+ | `gpt2-wechsel-german` (117M) | 26.8 |
27
+ | `gpt2` (retrained from scratch) (117M) | 27.63 |
28
+
29
+ ## License
30
+
31
+ MIT
32
+
33
+