amindada commited on
Commit
acb1449
1 Parent(s): e22d013

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -8
README.md CHANGED
@@ -7,8 +7,8 @@
7
  # Model Card for Model ID
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
- Developed in a joint effort between the University of Florida, NVIDIA, and IKIM, GeBERTa is a series of German DeBERTa models ranging between 122M and 750M
11
- parameters. The pre-training dataset consists of documents from different domains:
12
 
13
  | Category | Source Data | Data Size | #Docs | #Tokens |
14
  | -------- | ----------- | --------- | ------ | ------- |
@@ -18,12 +18,8 @@ parameters. The pre-training dataset consists of documents from different domain
18
  | Informal | Reddit 2019-2023 (GER) | 5.8GB | 15,036,592 | 1.3B |
19
  | Informal | Holiday Reviews | 2GB | 4,876,405 | 428M |
20
  | Legal | OpenLegalData: German cases and laws | 5.4GB | 308,228 | 1B |
21
- | Medical | Charite doctoral theses abstracts | 28MB | 16,947 | 5M |
22
- | Medical | Flexikon | 106MB | 74,136 | 23M |
23
- | Medical | NTS of Animal Experiments | 24MB | 50,310 | 4M |
24
- | Medical | German Guideline Program in Oncology | 13MB | 4,348 | 3M |
25
- | Medical | Springer Abstract | 79MB | 34,035 | 15M |
26
- | Medical | CC medical texts (GER) | 3.6GB | 2,000,000 | 682M |
27
  | Medical | Medicine Dissertations | 1.4GB | 14,496 | 295M |
28
  | Medical | Pubmed abstracts | 8.5GB | 21,044,382 | 1.7B |
29
  | Medical | MIMIC III | 2.6GB | 24,221,834 | 695M |
 
7
  # Model Card for Model ID
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
+ GeBERTa is a set of German DeBERTa models developed in a joint effort between the University of Florida, NVIDIA, and IKIM.
11
+ The models range in size from 122M to 750M parameters. The pre-training dataset consists of documents from different domains:
12
 
13
  | Category | Source Data | Data Size | #Docs | #Tokens |
14
  | -------- | ----------- | --------- | ------ | ------- |
 
18
  | Informal | Reddit 2019-2023 (GER) | 5.8GB | 15,036,592 | 1.3B |
19
  | Informal | Holiday Reviews | 2GB | 4,876,405 | 428M |
20
  | Legal | OpenLegalData: German cases and laws | 5.4GB | 308,228 | 1B |
21
+ | Medical | Smaller public datasets | 253MB | 179,776 | 50M |
22
+ | Medical | CC medical texts | 3.6GB | 2,000,000 | 682M |
 
 
 
 
23
  | Medical | Medicine Dissertations | 1.4GB | 14,496 | 295M |
24
  | Medical | Pubmed abstracts | 8.5GB | 21,044,382 | 1.7B |
25
  | Medical | MIMIC III | 2.6GB | 24,221,834 | 695M |