medicalai
/

ClinicalBERT

Inference Endpoints

Model card Files Files and versions Community

kimpty commited on Mar 19, 2023

Commit

4ead442

·

1 Parent(s): cb302af

Update README.md

Files changed (1) hide show

README.md +10 -8

README.md CHANGED Viewed

@@ -1,8 +1,13 @@
 # ClinicalBERT
 <!-- Provide a quick summary of what the model is/does. -->
 This model card describes the ClinicalBERT model, which was trained on a large multicenter dataset with a large corpus of 1.2B words of diverse diseases we constructed.
 ## Pretraining Data
@@ -11,11 +16,10 @@ For more details, see here.
 ## Model Pretraining
 ### Pretraining Procedures
 The training code can be found [here](https://www.github.com/xxx) and the model was trained on four A100 GPU.
-Model parameters were initialized with xxx.
 ### Pretraining Hyperparameters
@@ -29,8 +33,8 @@ All other default parameters were used (xxx).
 Load the model via the transformers library:
 ```python
 from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("kimpty/ClinicalBERT")
-model = AutoModel.from_pretrained("kimpty/ClinicalBERT")
 ```
@@ -40,6 +44,4 @@ Refer to the paper xxx.
 ## Questions?
-Post a Github issue on the xxx repo or email xxx with any questions.

+---
+tags:
+- medical
+---
 # ClinicalBERT
 <!-- Provide a quick summary of what the model is/does. -->
 This model card describes the ClinicalBERT model, which was trained on a large multicenter dataset with a large corpus of 1.2B words of diverse diseases we constructed.
+We then utilized a very large corpus of EHRs from 3,136,266 pediatric outpatient visits to fine tune the base language model.
 ## Pretraining Data
 ## Model Pretraining
 ### Pretraining Procedures
+The ClinicalBERT was initialized from BERT. Then the training followed the principle of masked language model, in which given a piece of text, we randomly replace some tokens by MASKs,
+special tokens for masking, and then require the model to predict the original tokens via contextual text.
 The training code can be found [here](https://www.github.com/xxx) and the model was trained on four A100 GPU.
 ### Pretraining Hyperparameters
 Load the model via the transformers library:
 ```python
 from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("medicalai/ClinicalBERT")
+model = AutoModel.from_pretrained("medicalai/ClinicalBERT")
 ```
 ## Questions?
+Post a Github issue on the xxx repo or email xxx with any questions.