## RadBERT-RoBERTa-4m This is one variant of our RadBERT models trained with 4 million deidentified medical reports from US VA hospital, which achieves stronger medical language understanding performance than previous medical domain models such as BioBERT, Clinical-BERT, BLUE-BERT and BioMed-RoBERTa. Performances are evaluated on three tasks: (a) abnormal sentence classification: sentence classification in radiology reports as reporting abnormal or normal findings; (b) report coding: Assign a diagnostic code to a given radiology report for five different coding systems; (c) report summarization: given the findings section of a radiology report, extractively select key sentences that summarized the findings. For details, check out the paper here: [RadBERT: Adapting transformer-based language models to radiology](https://pubs.rsna.org/doi/abs/10.1148/ryai.210258) Code for the paper is released at [this GitHub repo](https://github.com/zzxslp/RadBERT). ### How to use Here is an example of how to use this model to extract the features of a given text in PyTorch: ```python from transformers import AutoConfig, AutoTokenizer, AutoModel config = AutoConfig.from_pretrained('zzxslp/RadBERT-RoBERTa-4m') tokenizer = AutoTokenizer.from_pretrained('zzxslp/RadBERT-RoBERTa-4m') model = AutoModel.from_pretrained('zzxslp/RadBERT-RoBERTa-4m', config=config) text = "Replace me by any medical text you'd like." encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input) ``` ### BibTeX entry and citation info If you use the model, please cite our paper: ```bibtex @article{yan2022radbert, title={RadBERT: Adapting transformer-based language models to radiology}, author={Yan, An and McAuley, Julian and Lu, Xing and Du, Jiang and Chang, Eric Y and Gentili, Amilcare and Hsu, Chun-Nan}, journal={Radiology: Artificial Intelligence}, volume={4}, number={4}, pages={e210258}, year={2022}, publisher={Radiological Society of North America} } ```