amindada commited on
Commit
77eedc0
1 Parent(s): 6162bec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -33,11 +33,11 @@ The pre-training dataset consists of documents from different domains:
33
  | Medical | Smaller public datasets | 253MB | 179,776 | 50M |
34
  | Medical | CC medical texts | 3.6GB | 2,000,000 | 682M |
35
  | Medical | Medical Dissertations | 1.4GB | 14,496 | 295M |
36
- | Medical | Pubmed abstracts (translated | 8.5GB | 21,044,382 | 1.7B |
37
  | Medical | MIMIC III (translated) | 2.6GB | 24,221,834 | 695M |
38
- | Medical | PMC-Patients-ReCDS (translated | 2.1GB | 1,743,344 | 414M |
39
  | Literature | German Fiction | 1.1GB | 3,219 | 243M |
40
- | Literature | English books (translated | 7.1GB | 11,038 | 1.6B |
41
  | - | Total | 167GB | 116,079,769 | 35.8B |
42
 
43
 
@@ -55,10 +55,10 @@ The following table presents the F1 scores:
55
 
56
  | Model | [GE14](https://huggingface.co/datasets/germeval_14) | [GQuAD](https://huggingface.co/datasets/deepset/germanquad) | [GE18](https://huggingface.co/datasets/philschmid/germeval18) | TS | [GGP](https://github.com/JULIELab/GGPOnc) | GRAS<sup>1</sup> | [JS](https://github.com/JULIELab/jsyncc) | [DROC](https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/DROC-Release) | Avg |
57
  |:---------------------:|:--------:|:----------:|:--------:|:--------:|:-------:|:------:|:--------:|:------:|:------:|
58
- | GBERT-base | 87.10±0.12 | 72.19±0.82 | 51.27±1.4 | 72.34±0.48 | 78.17±0.25 | 62.90±0.01 | 77.18±3.34 | 88.03±0.20 | 73.65±0.50 |
59
- | GELECTRA-base | 86.19±0.5 | 74.09±0.70 | 48.02±1.80 | 70.62±0.44 | 77.53±0.11 | 65.97±0.01 | 71.17±2.94 | 88.06±0.37 | 72.71±0.66 |
60
- | GottBERT | 87.15±0.19 | 72.76±0.378 | 51.12±1.20 | 74.25±0.80 | **78.18**±0.11 | 65.71±0.01 | 74.60±4.75 | 88.61±0.23 | 74.05±0.51 |
61
- | GeBERTa-base | **88.06**±0.22 | **78.54**±0.32 | **53.16**±1.39 | **74.83**±0.36 | 78.13±0.15 | **68.37**±1.11 | **81.85**±5.23 | **89.14**±0.32 | **76.51**±0.32 |
62
 
63
  <sup>1</sup>Is not published yet but is described in the [MedBERT.de paper](https://arxiv.org/abs/2303.08179).
64
 
 
33
  | Medical | Smaller public datasets | 253MB | 179,776 | 50M |
34
  | Medical | CC medical texts | 3.6GB | 2,000,000 | 682M |
35
  | Medical | Medical Dissertations | 1.4GB | 14,496 | 295M |
36
+ | Medical | Pubmed abstracts (translated) | 8.5GB | 21,044,382 | 1.7B |
37
  | Medical | MIMIC III (translated) | 2.6GB | 24,221,834 | 695M |
38
+ | Medical | PMC-Patients-ReCDS (translated) | 2.1GB | 1,743,344 | 414M |
39
  | Literature | German Fiction | 1.1GB | 3,219 | 243M |
40
+ | Literature | English books (translated) | 7.1GB | 11,038 | 1.6B |
41
  | - | Total | 167GB | 116,079,769 | 35.8B |
42
 
43
 
 
55
 
56
  | Model | [GE14](https://huggingface.co/datasets/germeval_14) | [GQuAD](https://huggingface.co/datasets/deepset/germanquad) | [GE18](https://huggingface.co/datasets/philschmid/germeval18) | TS | [GGP](https://github.com/JULIELab/GGPOnc) | GRAS<sup>1</sup> | [JS](https://github.com/JULIELab/jsyncc) | [DROC](https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/DROC-Release) | Avg |
57
  |:---------------------:|:--------:|:----------:|:--------:|:--------:|:-------:|:------:|:--------:|:------:|:------:|
58
+ | [GBERT](https://huggingface.co/deepset/gbert-base)<sub>base</sub> | 87.10±0.12 | 72.19±0.82 | 51.27±1.4 | 72.34±0.48 | 78.17±0.25 | 62.90±0.01 | 77.18±3.34 | 88.03±0.20 | 73.65±0.50 |
59
+ | [GELECTRA](https://huggingface.co/deepset/gelectra-base)<sub>gelectra</sub> | 86.19±0.5 | 74.09±0.70 | 48.02±1.80 | 70.62±0.44 | 77.53±0.11 | 65.97±0.01 | 71.17±2.94 | 88.06±0.37 | 72.71±0.66 |
60
+ | [GottBERT](https://huggingface.co/uklfr/gottbert-base) | 87.15±0.19 | 72.76±0.378 | 51.12±1.20 | 74.25±0.80 | **78.18**±0.11 | 65.71±0.01 | 74.60±4.75 | 88.61±0.23 | 74.05±0.51 |
61
+ | GeBERTa<sub>base</sub> | **88.06**±0.22 | **78.54**±0.32 | **53.16**±1.39 | **74.83**±0.36 | 78.13±0.15 | **68.37**±1.11 | **81.85**±5.23 | **89.14**±0.32 | **76.51**±0.32 |
62
 
63
  <sup>1</sup>Is not published yet but is described in the [MedBERT.de paper](https://arxiv.org/abs/2303.08179).
64