ikim-uk-essen
/

geberta-large

Inference Endpoints

Model card Files Files and versions Community

amindada commited on Apr 10

Commit

e192b87

•

1 Parent(s): 0bd657b

Update README.md

Files changed (1) hide show

README.md +5 -28

README.md CHANGED Viewed

@@ -66,34 +66,11 @@ The following table presents the F1 scores:
 ## Publication
 ```bibtex
-@inproceedings{dada-etal-2023-impact,
-    title = "On the Impact of Cross-Domain Data on {G}erman Language Models",
-    author = "Dada, Amin  and
-      Chen, Aokun  and
-      Peng, Cheng  and
-      Smith, Kaleb  and
-      Idrissi-Yaghir, Ahmad  and
-      Seibold, Constantin  and
-      Li, Jianning  and
-      Heiliger, Lars  and
-      Friedrich, Christoph  and
-      Truhn, Daniel  and
-      Egger, Jan  and
-      Bian, Jiang  and
-      Kleesiek, Jens  and
-      Wu, Yonghui",
-    editor = "Bouamor, Houda  and
-      Pino, Juan  and
-      Bali, Kalika",
-    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
-    month = dec,
-    year = "2023",
-    address = "Singapore",
-    publisher = "Association for Computational Linguistics",
-    url = "https://aclanthology.org/2023.findings-emnlp.922",
-    doi = "10.18653/v1/2023.findings-emnlp.922",
-    pages = "13801--13813",
-    abstract = "Traditionally, large language models have been either trained on general web crawls or domain-specific data. However, recent successes of generative large language models, have shed light on the benefits of cross-domain datasets. To examine the significance of prioritizing data diversity over quality, we present a German dataset comprising texts from five domains, along with another dataset aimed at containing high-quality data. Through training a series of models ranging between 122M and 750M parameters on both datasets, we conduct a comprehensive benchmark on multiple downstream tasks. Our findings demonstrate that the models trained on the cross-domain dataset outperform those trained on quality data alone, leading to improvements up to 4.45{\%} over the previous state-of-the-art.",
 }
 ```
 ## Contact

 ## Publication
 ```bibtex
+@inproceedings{dada2023impact,
+  title={On the Impact of Cross-Domain Data on German Language Models},
+  author={Dada, Amin and Chen, Aokun and Peng, Cheng and Smith, Kaleb E and Idrissi-Yaghir, Ahmad and Seibold, Constantin Marc and Li, Jianning and Heiliger, Lars and Friedrich, Christoph M and Truhn, Daniel and others},
+  booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
+  year={2023}
 }
 ```
 ## Contact