metadata
title: README
emoji: 📉
colorFrom: green
colorTo: yellow
sdk: static
pinned: false
German Wikipedia LMs (GWLMs)
We present Language Models (BERT, BERT with Token Dropping, TEAMS, T5) pretrained on German Wikipedia.
This is an ongoing project!
German Wikipedia Corpus
We use a recent Wikipedia Dump, that can can be accessed here. Additionally, a sentence-segmented (using NLTK) is available here.
Fine-tuned Models
We fine-tuned NER models using SpanMarker library on GermEval 2014 NER dataset and upload the best models:
- GermEval 2014 NER Model with BERT as backbone LM
- GermEval 2014 NER with BERT + Token Dropping as backbone LM
- GermEval 2014 NER Model with TEAMS as backbone LM
Acknowledgements
Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️