Update README.md
Browse files
README.md
CHANGED
@@ -7,8 +7,8 @@
|
|
7 |
# Model Card for Model ID
|
8 |
|
9 |
<!-- Provide a quick summary of what the model is/does. -->
|
10 |
-
|
11 |
-
parameters. The pre-training dataset consists of documents from different domains:
|
12 |
|
13 |
| Category | Source Data | Data Size | #Docs | #Tokens |
|
14 |
| -------- | ----------- | --------- | ------ | ------- |
|
@@ -18,12 +18,8 @@ parameters. The pre-training dataset consists of documents from different domain
|
|
18 |
| Informal | Reddit 2019-2023 (GER) | 5.8GB | 15,036,592 | 1.3B |
|
19 |
| Informal | Holiday Reviews | 2GB | 4,876,405 | 428M |
|
20 |
| Legal | OpenLegalData: German cases and laws | 5.4GB | 308,228 | 1B |
|
21 |
-
| Medical |
|
22 |
-
| Medical |
|
23 |
-
| Medical | NTS of Animal Experiments | 24MB | 50,310 | 4M |
|
24 |
-
| Medical | German Guideline Program in Oncology | 13MB | 4,348 | 3M |
|
25 |
-
| Medical | Springer Abstract | 79MB | 34,035 | 15M |
|
26 |
-
| Medical | CC medical texts (GER) | 3.6GB | 2,000,000 | 682M |
|
27 |
| Medical | Medicine Dissertations | 1.4GB | 14,496 | 295M |
|
28 |
| Medical | Pubmed abstracts | 8.5GB | 21,044,382 | 1.7B |
|
29 |
| Medical | MIMIC III | 2.6GB | 24,221,834 | 695M |
|
|
|
7 |
# Model Card for Model ID
|
8 |
|
9 |
<!-- Provide a quick summary of what the model is/does. -->
|
10 |
+
GeBERTa is a set of German DeBERTa models developed in a joint effort between the University of Florida, NVIDIA, and IKIM.
|
11 |
+
The models range in size from 122M to 750M parameters. The pre-training dataset consists of documents from different domains:
|
12 |
|
13 |
| Category | Source Data | Data Size | #Docs | #Tokens |
|
14 |
| -------- | ----------- | --------- | ------ | ------- |
|
|
|
18 |
| Informal | Reddit 2019-2023 (GER) | 5.8GB | 15,036,592 | 1.3B |
|
19 |
| Informal | Holiday Reviews | 2GB | 4,876,405 | 428M |
|
20 |
| Legal | OpenLegalData: German cases and laws | 5.4GB | 308,228 | 1B |
|
21 |
+
| Medical | Smaller public datasets | 253MB | 179,776 | 50M |
|
22 |
+
| Medical | CC medical texts | 3.6GB | 2,000,000 | 682M |
|
|
|
|
|
|
|
|
|
23 |
| Medical | Medicine Dissertations | 1.4GB | 14,496 | 295M |
|
24 |
| Medical | Pubmed abstracts | 8.5GB | 21,044,382 | 1.7B |
|
25 |
| Medical | MIMIC III | 2.6GB | 24,221,834 | 695M |
|