mfajcik commited on
Commit
2454536
1 Parent(s): 14cacb0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -3,9 +3,10 @@ This is our GPT-2 XL trained as a part of the research involved in [SemANT proje
3
 
4
  ## Factsheet
5
  - The model is trained on our `15,621,685,248 token/78,48 GB/10,900,000,000 word/18,800,000 paragraph` corpus of Czech obtained by Web Crawling.
6
- - The original size of our corpus before deduplication and lm-filtering steps was`266,44 GB`.
7
  - The model was trained on
8
  - Our tokenizer size is 64k, and we use GPT-2 like `sentencepiece` encoding for tokenization.
 
9
  - The model was adapted from the original GPT-2 XL, by:
10
  - replacing the tokenizer,
11
  - corresponding embeddings, and
@@ -32,7 +33,7 @@ Not mentioned parameters are the same as for GPT-2.
32
  | gradient_clipping_max_norm | 1.0 | |
33
  | attn_impl | flash2 | |
34
  | dropout | 0.1 | for residuals, attention, embeddings |
35
- | fsdp | SHARD_GRAD_OP | (optimized for A100 40GB gpus) |
36
  | precision | bf16 | |
37
  | scheduler | linear | |
38
  | scheduler_warmup | 10,000 steps | |
 
3
 
4
  ## Factsheet
5
  - The model is trained on our `15,621,685,248 token/78,48 GB/10,900,000,000 word/18,800,000 paragraph` corpus of Czech obtained by Web Crawling.
6
+ - The original size of our corpus before deduplication and lm-filtering steps was `266,44 GB`.
7
  - The model was trained on
8
  - Our tokenizer size is 64k, and we use GPT-2 like `sentencepiece` encoding for tokenization.
9
+ - The model was trained by 133,000 update steps (~139B training tokens), before the end of the experiment.
10
  - The model was adapted from the original GPT-2 XL, by:
11
  - replacing the tokenizer,
12
  - corresponding embeddings, and
 
33
  | gradient_clipping_max_norm | 1.0 | |
34
  | attn_impl | flash2 | |
35
  | dropout | 0.1 | for residuals, attention, embeddings |
36
+ | fsdp | SHARD_GRAD_OP | (optimized for A100 40GB GPUs) |
37
  | precision | bf16 | |
38
  | scheduler | linear | |
39
  | scheduler_warmup | 10,000 steps | |