🇩🇪 GERTuraX-2

This repository hosts the GERTuraX-2 model:

  • GERTuraX-2 is a pretrained German encoder-only model, based on ELECTRA and pretrained with the TEAMS approach.
  • It was trained on 486GB of plain text from the CulturaX corpus.

Pretraining

The TensorFlow Model Garden LMs repo was used to train an ELECTRA model using the very efficient TEAMS approach.

As pretraining corpus, 486GB of plain text was extracted from the CulturaX corpus.

GERTuraX-2 uses a 64k vocab corpus (cased) and was trained for 1M steps with a batch size of 1024 and a sequence length of 512 on a v3-32 TPU Pod.

The pretraining took 5.4 days and the TensorBoard can be found here.

Evaluation

GERTuraX-2 was tested on GermEval 2014 (NER), GermEval 2018 (Sentiment analysis), CoNLL-2003 (NER) and on the ScandEval benchmark.

We use the same hyper-parameters for GermEval 2014, GermEval 2018 and CoNLL-2003 as used in the GeBERTa paper (cf. Table 5) using 5 runs with different seed and report the averaged score, conducted with the awesome Flair library.

The fine-tuning code repository can be found here.

GermEval 2014

GermEval 2014 - Original version

Model Name Avg. Development F1-Score Avg. Test F1-Score
GBERT Base 87.53 ± 0.22 86.81 ± 0.16
GERTuraX-1 (147GB) 88.32 ± 0.21 87.18 ± 0.12
GERTuraX-2 (486GB) 88.58 ± 0.32 87.58 ± 0.15
GERTuraX-3 (1.1TB) 88.90 ± 0.06 87.84 ± 0.18
GeBERTa Base 88.79 ± 0.16 88.03 ± 0.16

GermEval 2014 - Without Wikipedia

Model Name Avg. Development F1-Score Avg. Test F1-Score
GBERT Base 90.48 ± 0.34 89.05 ± 0.21
GERTuraX-1 (147GB) 91.27 ± 0.11 89.73 ± 0.27
GERTuraX-2 (486GB) 91.70 ± 0.28 89.98 ± 0.22
GERTuraX-3 (1.1TB) 91.75 ± 0.17 90.24 ± 0.27
GeBERTa Base 91.74 ± 0.23 90.28 ± 0.21

GermEval 2018

GermEval 2018 - Fine Grained

Model Name Avg. Development F1-Score Avg. Test F1-Score
GBERT Base 63.66 ± 4.08 51.86 ± 1.31
GERTuraX-1 (147GB) 62.87 ± 1.95 50.61 ± 0.36
GERTuraX-2 (486GB) 64.37 ± 1.31 51.02 ± 0.90
GERTuraX-3 (1.1TB) 66.39 ± 0.85 49.94 ± 2.06
GeBERTa Base 65.81 ± 3.29 52.45 ± 0.57

GermEval 2018 - Coarse Grained

Model Name Avg. Development F1-Score Avg. Test F1-Score
GBERT Base 83.15 ± 1.83 76.39 ± 0.64
GERTuraX-1 (147GB) 83.72 ± 0.68 77.11 ± 0.59
GERTuraX-2 (486GB) 84.51 ± 0.88 78.07 ± 0.91
GERTuraX-3 (1.1TB) 84.33 ± 1.48 78.44 ± 0.74
GeBERTa Base 83.54 ± 1.27 78.36 ± 0.79

CoNLL-2003 - German, Revised

Model Name Avg. Development F1-Score Avg. Test F1-Score
GBERT Base 92.15 ± 0.10 88.73 ± 0.21
GERTuraX-1 (147GB) 92.32 ± 0.14 90.09 ± 0.12
GERTuraX-2 (486GB) 92.75 ± 0.20 90.15 ± 0.14
GERTuraX-3 (1.1TB) 92.77 ± 0.28 90.83 ± 0.16
GeBERTa Base 92.87 ± 0.21 90.94 ± 0.24

ScandEval

We use v12.10.5 of ScandEval to evaluate on the following tasks:

  • SB10k
  • ScaLA-De
  • GermanQuAD

The package can be installed via:

$ pip3 install "scandeval[all]==12.10.5"

Results

SB10k

Evaluations on the SB10k dataset can be started like:

$ scandeval --model "deepset/gbert-base" --task sentiment-classification --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-1" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-2" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-3" --task sentiment-classification --language de
Model Name Matthew's CC Macro F1-Score
GBERT Base 59.58 ± 1.80 72.98 ± 1.20
GERTuraX-1 (147GB) 61.56 ± 2.58 74.18 ± 1.77
GERTuraX-2 (486GB) 65.24 ± 1.77 76.55 ± 1.22
GERTuraX-3 (1.1TB) 64.33 ± 2.17 75.99 ± 1.40
GeBERTa Base 59.52 ± 2.14 72.76 ± 1.50

ScaLA-De

Evaluations on the ScaLA-De dataset can be started like:

$ scandeval --model "deepset/gbert-base" --task linguistic-acceptability --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-1" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-2" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-3" --task linguistic-acceptability --language de
Model Name Matthew's CC Macro F1-Score
GBERT Base 52.23 ± 4.34 73.90 ± 2.68
GERTuraX-1 (147GB) 74.55 ± 1.28 86.88 ± 0.75
GERTuraX-2 (486GB) 75.83 ± 2.85 87.59 ± 1.57
GERTuraX-3 (1.1TB) 78.24 ± 1.25 88.83 ± 0.63
GeBERTa Base 59.70 ± 11.64 78.44 ± 6.12

GermanQuAD

$ scandeval --model "deepset/gbert-base" --task question-answering --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-1" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-2" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-3" --task question-answering --language de
Model Name Em F1-Score
GBERT Base 12.62 ± 2.20 29.62 ± 3.86
GERTuraX-1 (147GB) 27.24 ± 1.05 52.01 ± 1.10
GERTuraX-2 (486GB) 29.54 ± 1.05 55.12 ± 0.92
GERTuraX-3 (1.1TB) 28.49 ± 1.21 54.83 ± 1.26
GeBERTa Base 28.81 ± 1.77 53.27 ± 1.92

❤️ Acknowledgements

GERTuraX is the outcome of the last 12 months of working with TPUs from the awesome TRC program and the TensorFlow Model Garden library.

Many thanks for providing TPUs!

Made from Bavarian Oberland with ❤️ and 🥨.

Downloads last month
30
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Dataset used to train gerturax/gerturax-2