stefan-it commited on
Commit
91dd9d9
1 Parent(s): 38be2b2

readme: add initial version

Browse files
Files changed (1) hide show
  1. README.md +22 -1
README.md CHANGED
@@ -7,4 +7,25 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # German Wikipedia LMs (GWLMs)
11
+
12
+ We present Language Models ([BERT](https://huggingface.co/gwlms/bert-base-dewiki-v1), [BERT with Token Dropping](https://huggingface.co/gwlms/bert-base-token-dropping-dewiki-v1), [TEAMS](https://huggingface.co/gwlms/teams-base-dewiki-v1-discriminator), [T5](https://huggingface.co/gwlms/t5-efficient-large-dewiki-v1)) pretrained on German Wikipedia.
13
+
14
+ This is an ongoing project!
15
+
16
+ # German Wikipedia Corpus
17
+
18
+ We use a recent Wikipedia Dump, that can can be accessed [here](https://huggingface.co/datasets/gwlms/dewiki-20230701). Additionally, a sentence-segmented (using NLTK) is available [here](https://huggingface.co/datasets/gwlms/dewiki-20230701-nltk-corpus).
19
+
20
+ # Fine-tuned Models
21
+
22
+ We fine-tuned NER models using [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) library on GermEval 2014 NER dataset and upload the best models:
23
+
24
+ * [GermEval 2014 NER Model with BERT as backbone LM](https://huggingface.co/gwlms/span-marker-bert-germeval14)
25
+ * [GermEval 2014 NER with BERT + Token Dropping as backbone LM](https://huggingface.co/gwlms/span-marker-token-dropping-bert-germeval14)
26
+ * [GermEval 2014 NER Model with TEAMS as backbone LM](https://huggingface.co/gwlms/span-marker-teams-germeval14)
27
+
28
+ # Acknowledgements
29
+
30
+ Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
31
+ Many Thanks for providing access to the TPUs ❤️