metadata
license: cc-by-nc-4.0
datasets:
- starmpcc/Asclepius-Synthetic-Clinical-Notes
language:
- en
pipeline_tag: text-generation
tags:
- medical
Model Card for Model ID
This is an pre-trained Llama2-13B model, which was trained using causal language modeling on Asclepius-Synthetic-Clinical-Notes.
The Asclepius-Llama2-13B model was developed from this checkpoint by applying instruction fine-tuning.
UPDATE
2024.01.10
- Asclepius-R, the variant of Asclepius that trained on MIMIC-III discharge summaries, is now available on Physionet!
Model Details
Model Description
- Model type: Clinical LLM (Large Language Model)
- Language(s) (NLP): English
- License: CC-BY-NC-SA 4.0
- Finetuned from model: Llama2-13B
Model Sources
- Repository: https://github.com/starmpcc/Asclepius
- Paper: https://arxiv.org/abs/2309.00237
- Data: https://huggingface.co/datasets/starmpcc/Asclepius-Synthetic-Clinical-Notes
Uses
This model is trained with causal launguage modeling, using Asclepius-Synthetic-Clinical-Notes.
Out-of-Scope Use
ONLY USE THIS MODEL FOR RESEARCH PURPOSE!!
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("starmpcc/Asclepius-Llama2-13B-Pretraining-Only", use_fast=False)
model = AutoModelForCausalLM.from_pretrained("starmpcc/Asclepius-Llama2-13B-Pretraining-Only")
model_input = "YOUR INPUT"
input_ids = tokenizer(model_input, return_tensors="pt").input_ids
output = model.generate(input_ids)
print(tokenizer.decode(output[0]))
Training Details
Training Data
https://huggingface.co/datasets/starmpcc/Asclepius-Synthetic-Clinical-Notes
Training Procedure
- Causal language modeling on synthetic clinical notes.
Training Hyperparameters
- We followed config used in Stanford Alpaca
Speeds, Sizes, Times
- Pre-Training (1 epoch): 1h 58m with 8x A100 80G
Citation
BibTeX:
@misc{kweon2023publicly,
title={Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes},
author={Sunjun Kweon and Junu Kim and Jiyoun Kim and Sujeong Im and Eunbyeol Cho and Seongsu Bae and Jungwoo Oh and Gyubok Lee and Jong Hak Moon and Seng Chan You and Seungjin Baek and Chang Hoon Han and Yoon Bin Jung and Yohan Jo and Edward Choi},
year={2023},
eprint={2309.00237},
archivePrefix={arXiv},
primaryClass={cs.CL}
}