🇩🇪 GERTuraX-2
This repository hosts the GERTuraX-2 model:
- GERTuraX-2 is a pretrained German encoder-only model, based on ELECTRA and pretrained with the TEAMS approach.
- It was trained on 486GB of plain text from the CulturaX corpus.
Pretraining
The TensorFlow Model Garden LMs repo was used to train an ELECTRA model using the very efficient TEAMS approach.
As pretraining corpus, 486GB of plain text was extracted from the CulturaX corpus.
GERTuraX-2 uses a 64k vocab corpus (cased) and was trained for 1M steps with a batch size of 1024 and a sequence length of 512 on a v3-32 TPU Pod.
The pretraining took 5.4 days and the TensorBoard can be found here.
Evaluation
GERTuraX-2 was tested on GermEval 2014 (NER), GermEval 2018 (Sentiment analysis), CoNLL-2003 (NER) and on the ScandEval benchmark.
We use the same hyper-parameters for GermEval 2014, GermEval 2018 and CoNLL-2003 as used in the GeBERTa paper (cf. Table 5) using 5 runs with different seed and report the averaged score, conducted with the awesome Flair library.
The fine-tuning code repository can be found here.
GermEval 2014
GermEval 2014 - Original version
Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
---|---|---|
GBERT Base | 87.53 ± 0.22 | 86.81 ± 0.16 |
GERTuraX-1 (147GB) | 88.32 ± 0.21 | 87.18 ± 0.12 |
GERTuraX-2 (486GB) | 88.58 ± 0.32 | 87.58 ± 0.15 |
GERTuraX-3 (1.1TB) | 88.90 ± 0.06 | 87.84 ± 0.18 |
GeBERTa Base | 88.79 ± 0.16 | 88.03 ± 0.16 |
GermEval 2014 - Without Wikipedia
Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
---|---|---|
GBERT Base | 90.48 ± 0.34 | 89.05 ± 0.21 |
GERTuraX-1 (147GB) | 91.27 ± 0.11 | 89.73 ± 0.27 |
GERTuraX-2 (486GB) | 91.70 ± 0.28 | 89.98 ± 0.22 |
GERTuraX-3 (1.1TB) | 91.75 ± 0.17 | 90.24 ± 0.27 |
GeBERTa Base | 91.74 ± 0.23 | 90.28 ± 0.21 |
GermEval 2018
GermEval 2018 - Fine Grained
Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
---|---|---|
GBERT Base | 63.66 ± 4.08 | 51.86 ± 1.31 |
GERTuraX-1 (147GB) | 62.87 ± 1.95 | 50.61 ± 0.36 |
GERTuraX-2 (486GB) | 64.37 ± 1.31 | 51.02 ± 0.90 |
GERTuraX-3 (1.1TB) | 66.39 ± 0.85 | 49.94 ± 2.06 |
GeBERTa Base | 65.81 ± 3.29 | 52.45 ± 0.57 |
GermEval 2018 - Coarse Grained
Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
---|---|---|
GBERT Base | 83.15 ± 1.83 | 76.39 ± 0.64 |
GERTuraX-1 (147GB) | 83.72 ± 0.68 | 77.11 ± 0.59 |
GERTuraX-2 (486GB) | 84.51 ± 0.88 | 78.07 ± 0.91 |
GERTuraX-3 (1.1TB) | 84.33 ± 1.48 | 78.44 ± 0.74 |
GeBERTa Base | 83.54 ± 1.27 | 78.36 ± 0.79 |
CoNLL-2003 - German, Revised
Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
---|---|---|
GBERT Base | 92.15 ± 0.10 | 88.73 ± 0.21 |
GERTuraX-1 (147GB) | 92.32 ± 0.14 | 90.09 ± 0.12 |
GERTuraX-2 (486GB) | 92.75 ± 0.20 | 90.15 ± 0.14 |
GERTuraX-3 (1.1TB) | 92.77 ± 0.28 | 90.83 ± 0.16 |
GeBERTa Base | 92.87 ± 0.21 | 90.94 ± 0.24 |
ScandEval
We use v12.10.5 of ScandEval to evaluate on the following tasks:
- SB10k
- ScaLA-De
- GermanQuAD
The package can be installed via:
$ pip3 install "scandeval[all]==12.10.5"
Results
SB10k
Evaluations on the SB10k dataset can be started like:
$ scandeval --model "deepset/gbert-base" --task sentiment-classification --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-1" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-2" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-3" --task sentiment-classification --language de
Model Name | Matthew's CC | Macro F1-Score |
---|---|---|
GBERT Base | 59.58 ± 1.80 | 72.98 ± 1.20 |
GERTuraX-1 (147GB) | 61.56 ± 2.58 | 74.18 ± 1.77 |
GERTuraX-2 (486GB) | 65.24 ± 1.77 | 76.55 ± 1.22 |
GERTuraX-3 (1.1TB) | 64.33 ± 2.17 | 75.99 ± 1.40 |
GeBERTa Base | 59.52 ± 2.14 | 72.76 ± 1.50 |
ScaLA-De
Evaluations on the ScaLA-De dataset can be started like:
$ scandeval --model "deepset/gbert-base" --task linguistic-acceptability --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-1" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-2" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-3" --task linguistic-acceptability --language de
Model Name | Matthew's CC | Macro F1-Score |
---|---|---|
GBERT Base | 52.23 ± 4.34 | 73.90 ± 2.68 |
GERTuraX-1 (147GB) | 74.55 ± 1.28 | 86.88 ± 0.75 |
GERTuraX-2 (486GB) | 75.83 ± 2.85 | 87.59 ± 1.57 |
GERTuraX-3 (1.1TB) | 78.24 ± 1.25 | 88.83 ± 0.63 |
GeBERTa Base | 59.70 ± 11.64 | 78.44 ± 6.12 |
GermanQuAD
$ scandeval --model "deepset/gbert-base" --task question-answering --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-1" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-2" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-3" --task question-answering --language de
Model Name | Em | F1-Score |
---|---|---|
GBERT Base | 12.62 ± 2.20 | 29.62 ± 3.86 |
GERTuraX-1 (147GB) | 27.24 ± 1.05 | 52.01 ± 1.10 |
GERTuraX-2 (486GB) | 29.54 ± 1.05 | 55.12 ± 0.92 |
GERTuraX-3 (1.1TB) | 28.49 ± 1.21 | 54.83 ± 1.26 |
GeBERTa Base | 28.81 ± 1.77 | 53.27 ± 1.92 |
❤️ Acknowledgements
GERTuraX is the outcome of the last 12 months of working with TPUs from the awesome TRC program and the TensorFlow Model Garden library.
Many thanks for providing TPUs!
Made from Bavarian Oberland with ❤️ and 🥨.
- Downloads last month
- 30