Finnish-NLP
/

gpt2-medium-finnish

@@ -19,7 +19,7 @@ Pretrained GPT-2 medium model on Finnish language using a causal language modeli
 [this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
 and first released at [this page](https://openai.com/blog/better-language-models/).
-**Note**: this model is 345M parameter variant as in Huggingface's [GPT-2-medium config](https://huggingface.co/gpt2-medium), so not the famous big 1.5B parameter variant by OpenAI.
 ## Model description
@@ -106,17 +106,18 @@ vocabulary size of 50,257. The inputs are sequences of 512 consecutive tokens.
 ### Pretraining
-The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/), for 360k steps. The optimizer used was a AdamW with learning rate 1e-4, learning rate warmup for 4000 steps and cosine decay of the learning rate after.
 ## Evaluation results
-Evaluation was done using the *validation* split of the [mc4_fi_cleaned](https://huggingface.co/datasets/Finnish-NLP/mc4_fi_cleaned) dataset with [Perplexity](https://huggingface.co/course/chapter7/3#perplexity-for-language-models) (smaller score the better) as the evaluation metric. As seen from the table below, this model (the first row of the table) performs better than our smaller [gpt2-finnish](https://huggingface.co/Finnish-NLP/gpt2-finnish) model variant.
 |                                          | Perplexity |
 |------------------------------------------|------------|
-|Finnish-NLP/gpt2-medium-finnish           |**34.08**   |
 |Finnish-NLP/gpt2-finnish                  |44.19       |
 ## Team Members

 [this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
 and first released at [this page](https://openai.com/blog/better-language-models/).
+**Note**: this model is 345M parameter variant as in Huggingface's [GPT-2-medium config](https://huggingface.co/gpt2-medium), so not the famous big 1.5B parameter variant by OpenAI. We also have bigger 774M parameter variant [gpt2-large-finnish](https://huggingface.co/Finnish-NLP/gpt2-large-finnish) available which performs better compared to this model.
 ## Model description
 ### Pretraining
+The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/), for 360k steps (a bit over 1 epoch, 128 batch size). The optimizer used was a AdamW with learning rate 1e-4, learning rate warmup for 4000 steps and cosine decay of the learning rate after.
 ## Evaluation results
+Evaluation was done using the *validation* split of the [mc4_fi_cleaned](https://huggingface.co/datasets/Finnish-NLP/mc4_fi_cleaned) dataset with [Perplexity](https://huggingface.co/course/chapter7/3#perplexity-for-language-models) (smaller score the better) as the evaluation metric. As seen from the table below, this model (the first row of the table) performs better than our smaller [gpt2-finnish](https://huggingface.co/Finnish-NLP/gpt2-finnish) model variant but loses to our bigger [gpt2-large-finnish](https://huggingface.co/Finnish-NLP/gpt2-large-finnish) model.
 |                                          | Perplexity |
 |------------------------------------------|------------|
+|Finnish-NLP/gpt2-medium-finnish           |34.08       |
 |Finnish-NLP/gpt2-finnish                  |44.19       |
+|Finnish-NLP/gpt2-large-finnish            |**30.74**   |
 ## Team Members