Update README.md
Browse files
README.md
CHANGED
@@ -33,11 +33,11 @@ The pre-training dataset consists of documents from different domains:
|
|
33 |
| Medical | Smaller public datasets | 253MB | 179,776 | 50M |
|
34 |
| Medical | CC medical texts | 3.6GB | 2,000,000 | 682M |
|
35 |
| Medical | Medical Dissertations | 1.4GB | 14,496 | 295M |
|
36 |
-
| Medical | Pubmed abstracts (translated | 8.5GB | 21,044,382 | 1.7B |
|
37 |
| Medical | MIMIC III (translated) | 2.6GB | 24,221,834 | 695M |
|
38 |
-
| Medical | PMC-Patients-ReCDS (translated | 2.1GB | 1,743,344 | 414M |
|
39 |
| Literature | German Fiction | 1.1GB | 3,219 | 243M |
|
40 |
-
| Literature | English books (translated | 7.1GB | 11,038 | 1.6B |
|
41 |
| - | Total | 167GB | 116,079,769 | 35.8B |
|
42 |
|
43 |
|
@@ -55,10 +55,10 @@ The following table presents the F1 scores:
|
|
55 |
|
56 |
| Model | [GE14](https://huggingface.co/datasets/germeval_14) | [GQuAD](https://huggingface.co/datasets/deepset/germanquad) | [GE18](https://huggingface.co/datasets/philschmid/germeval18) | TS | [GGP](https://github.com/JULIELab/GGPOnc) | GRAS<sup>1</sup> | [JS](https://github.com/JULIELab/jsyncc) | [DROC](https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/DROC-Release) | Avg |
|
57 |
|:---------------------:|:--------:|:----------:|:--------:|:--------:|:-------:|:------:|:--------:|:------:|:------:|
|
58 |
-
| GBERT-base
|
59 |
-
| GELECTRA-base | 86.19±0.5 | 74.09±0.70 | 48.02±1.80 | 70.62±0.44 | 77.53±0.11 | 65.97±0.01 | 71.17±2.94 | 88.06±0.37 | 72.71±0.66 |
|
60 |
-
| GottBERT | 87.15±0.19 | 72.76±0.378 | 51.12±1.20 | 74.25±0.80 | **78.18**±0.11 | 65.71±0.01 | 74.60±4.75 | 88.61±0.23 | 74.05±0.51 |
|
61 |
-
| GeBERTa
|
62 |
|
63 |
<sup>1</sup>Is not published yet but is described in the [MedBERT.de paper](https://arxiv.org/abs/2303.08179).
|
64 |
|
|
|
33 |
| Medical | Smaller public datasets | 253MB | 179,776 | 50M |
|
34 |
| Medical | CC medical texts | 3.6GB | 2,000,000 | 682M |
|
35 |
| Medical | Medical Dissertations | 1.4GB | 14,496 | 295M |
|
36 |
+
| Medical | Pubmed abstracts (translated) | 8.5GB | 21,044,382 | 1.7B |
|
37 |
| Medical | MIMIC III (translated) | 2.6GB | 24,221,834 | 695M |
|
38 |
+
| Medical | PMC-Patients-ReCDS (translated) | 2.1GB | 1,743,344 | 414M |
|
39 |
| Literature | German Fiction | 1.1GB | 3,219 | 243M |
|
40 |
+
| Literature | English books (translated) | 7.1GB | 11,038 | 1.6B |
|
41 |
| - | Total | 167GB | 116,079,769 | 35.8B |
|
42 |
|
43 |
|
|
|
55 |
|
56 |
| Model | [GE14](https://huggingface.co/datasets/germeval_14) | [GQuAD](https://huggingface.co/datasets/deepset/germanquad) | [GE18](https://huggingface.co/datasets/philschmid/germeval18) | TS | [GGP](https://github.com/JULIELab/GGPOnc) | GRAS<sup>1</sup> | [JS](https://github.com/JULIELab/jsyncc) | [DROC](https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/DROC-Release) | Avg |
|
57 |
|:---------------------:|:--------:|:----------:|:--------:|:--------:|:-------:|:------:|:--------:|:------:|:------:|
|
58 |
+
| [GBERT](https://huggingface.co/deepset/gbert-base)<sub>base</sub> | 87.10±0.12 | 72.19±0.82 | 51.27±1.4 | 72.34±0.48 | 78.17±0.25 | 62.90±0.01 | 77.18±3.34 | 88.03±0.20 | 73.65±0.50 |
|
59 |
+
| [GELECTRA](https://huggingface.co/deepset/gelectra-base)<sub>gelectra</sub> | 86.19±0.5 | 74.09±0.70 | 48.02±1.80 | 70.62±0.44 | 77.53±0.11 | 65.97±0.01 | 71.17±2.94 | 88.06±0.37 | 72.71±0.66 |
|
60 |
+
| [GottBERT](https://huggingface.co/uklfr/gottbert-base) | 87.15±0.19 | 72.76±0.378 | 51.12±1.20 | 74.25±0.80 | **78.18**±0.11 | 65.71±0.01 | 74.60±4.75 | 88.61±0.23 | 74.05±0.51 |
|
61 |
+
| GeBERTa<sub>base</sub> | **88.06**±0.22 | **78.54**±0.32 | **53.16**±1.39 | **74.83**±0.36 | 78.13±0.15 | **68.37**±1.11 | **81.85**±5.23 | **89.14**±0.32 | **76.51**±0.32 |
|
62 |
|
63 |
<sup>1</sup>Is not published yet but is described in the [MedBERT.de paper](https://arxiv.org/abs/2303.08179).
|
64 |
|