Update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ Pretrained GPT-2 medium model on Finnish language using a causal language modeli
|
|
19 |
[this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
|
20 |
and first released at [this page](https://openai.com/blog/better-language-models/).
|
21 |
|
22 |
-
**Note**: this model is 345M parameter variant as in Huggingface's [GPT-2-medium config](https://huggingface.co/gpt2-medium), so not the famous big 1.5B parameter variant by OpenAI.
|
23 |
|
24 |
## Model description
|
25 |
|
@@ -106,17 +106,18 @@ vocabulary size of 50,257. The inputs are sequences of 512 consecutive tokens.
|
|
106 |
|
107 |
### Pretraining
|
108 |
|
109 |
-
The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/), for 360k steps. The optimizer used was a AdamW with learning rate 1e-4, learning rate warmup for 4000 steps and cosine decay of the learning rate after.
|
110 |
|
111 |
|
112 |
## Evaluation results
|
113 |
|
114 |
-
Evaluation was done using the *validation* split of the [mc4_fi_cleaned](https://huggingface.co/datasets/Finnish-NLP/mc4_fi_cleaned) dataset with [Perplexity](https://huggingface.co/course/chapter7/3#perplexity-for-language-models) (smaller score the better) as the evaluation metric. As seen from the table below, this model (the first row of the table) performs better than our smaller [gpt2-finnish](https://huggingface.co/Finnish-NLP/gpt2-finnish) model variant.
|
115 |
|
116 |
| | Perplexity |
|
117 |
|------------------------------------------|------------|
|
118 |
-
|Finnish-NLP/gpt2-medium-finnish
|
119 |
|Finnish-NLP/gpt2-finnish |44.19 |
|
|
|
120 |
|
121 |
## Team Members
|
122 |
|
|
|
19 |
[this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
|
20 |
and first released at [this page](https://openai.com/blog/better-language-models/).
|
21 |
|
22 |
+
**Note**: this model is 345M parameter variant as in Huggingface's [GPT-2-medium config](https://huggingface.co/gpt2-medium), so not the famous big 1.5B parameter variant by OpenAI. We also have bigger 774M parameter variant [gpt2-large-finnish](https://huggingface.co/Finnish-NLP/gpt2-large-finnish) available which performs better compared to this model.
|
23 |
|
24 |
## Model description
|
25 |
|
|
|
106 |
|
107 |
### Pretraining
|
108 |
|
109 |
+
The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/), for 360k steps (a bit over 1 epoch, 128 batch size). The optimizer used was a AdamW with learning rate 1e-4, learning rate warmup for 4000 steps and cosine decay of the learning rate after.
|
110 |
|
111 |
|
112 |
## Evaluation results
|
113 |
|
114 |
+
Evaluation was done using the *validation* split of the [mc4_fi_cleaned](https://huggingface.co/datasets/Finnish-NLP/mc4_fi_cleaned) dataset with [Perplexity](https://huggingface.co/course/chapter7/3#perplexity-for-language-models) (smaller score the better) as the evaluation metric. As seen from the table below, this model (the first row of the table) performs better than our smaller [gpt2-finnish](https://huggingface.co/Finnish-NLP/gpt2-finnish) model variant but loses to our bigger [gpt2-large-finnish](https://huggingface.co/Finnish-NLP/gpt2-large-finnish) model.
|
115 |
|
116 |
| | Perplexity |
|
117 |
|------------------------------------------|------------|
|
118 |
+
|Finnish-NLP/gpt2-medium-finnish |34.08 |
|
119 |
|Finnish-NLP/gpt2-finnish |44.19 |
|
120 |
+
|Finnish-NLP/gpt2-large-finnish |**30.74** |
|
121 |
|
122 |
## Team Members
|
123 |
|