Update README.md
Browse files
README.md
CHANGED
@@ -3,9 +3,10 @@ This is our GPT-2 XL trained as a part of the research involved in [SemANT proje
|
|
3 |
|
4 |
## Factsheet
|
5 |
- The model is trained on our `15,621,685,248 token/78,48 GB/10,900,000,000 word/18,800,000 paragraph` corpus of Czech obtained by Web Crawling.
|
6 |
-
- The original size of our corpus before deduplication and lm-filtering steps was`266,44 GB`.
|
7 |
- The model was trained on
|
8 |
- Our tokenizer size is 64k, and we use GPT-2 like `sentencepiece` encoding for tokenization.
|
|
|
9 |
- The model was adapted from the original GPT-2 XL, by:
|
10 |
- replacing the tokenizer,
|
11 |
- corresponding embeddings, and
|
@@ -32,7 +33,7 @@ Not mentioned parameters are the same as for GPT-2.
|
|
32 |
| gradient_clipping_max_norm | 1.0 | |
|
33 |
| attn_impl | flash2 | |
|
34 |
| dropout | 0.1 | for residuals, attention, embeddings |
|
35 |
-
| fsdp | SHARD_GRAD_OP | (optimized for A100 40GB
|
36 |
| precision | bf16 | |
|
37 |
| scheduler | linear | |
|
38 |
| scheduler_warmup | 10,000 steps | |
|
|
|
3 |
|
4 |
## Factsheet
|
5 |
- The model is trained on our `15,621,685,248 token/78,48 GB/10,900,000,000 word/18,800,000 paragraph` corpus of Czech obtained by Web Crawling.
|
6 |
+
- The original size of our corpus before deduplication and lm-filtering steps was `266,44 GB`.
|
7 |
- The model was trained on
|
8 |
- Our tokenizer size is 64k, and we use GPT-2 like `sentencepiece` encoding for tokenization.
|
9 |
+
- The model was trained by 133,000 update steps (~139B training tokens), before the end of the experiment.
|
10 |
- The model was adapted from the original GPT-2 XL, by:
|
11 |
- replacing the tokenizer,
|
12 |
- corresponding embeddings, and
|
|
|
33 |
| gradient_clipping_max_norm | 1.0 | |
|
34 |
| attn_impl | flash2 | |
|
35 |
| dropout | 0.1 | for residuals, attention, embeddings |
|
36 |
+
| fsdp | SHARD_GRAD_OP | (optimized for A100 40GB GPUs) |
|
37 |
| precision | bf16 | |
|
38 |
| scheduler | linear | |
|
39 |
| scheduler_warmup | 10,000 steps | |
|