metadata

license: cc-by-nc-4.0
datasets:
  - starmpcc/Asclepius-Synthetic-Clinical-Notes
language:
  - en
pipeline_tag: text-generation
tags:
  - medical

Model Card for Model ID

This is an pre-trained Llama2-13B model, which was trained using causal language modeling on Asclepius-Synthetic-Clinical-Notes.

The Asclepius-Llama2-13B model was developed from this checkpoint by applying instruction fine-tuning.

UPDATE

2024.01.10

Asclepius-R, the variant of Asclepius that trained on MIMIC-III discharge summaries, is now available on Physionet!

Model Details

Model Description

Model type: Clinical LLM (Large Language Model)
Language(s) (NLP): English
License: CC-BY-NC-SA 4.0
Finetuned from model: Llama2-13B

Model Sources

Repository: https://github.com/starmpcc/Asclepius
Paper: https://arxiv.org/abs/2309.00237
Data: https://huggingface.co/datasets/starmpcc/Asclepius-Synthetic-Clinical-Notes

Uses

This model is trained with causal launguage modeling, using Asclepius-Synthetic-Clinical-Notes.

Out-of-Scope Use

ONLY USE THIS MODEL FOR RESEARCH PURPOSE!!

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("starmpcc/Asclepius-Llama2-13B-Pretraining-Only", use_fast=False)
model = AutoModelForCausalLM.from_pretrained("starmpcc/Asclepius-Llama2-13B-Pretraining-Only")

model_input = "YOUR INPUT"
input_ids = tokenizer(model_input, return_tensors="pt").input_ids
output = model.generate(input_ids)
print(tokenizer.decode(output[0]))

Training Details

Training Data

https://huggingface.co/datasets/starmpcc/Asclepius-Synthetic-Clinical-Notes

Training Procedure

Causal language modeling on synthetic clinical notes.

Training Hyperparameters

We followed config used in Stanford Alpaca

Speeds, Sizes, Times

Pre-Training (1 epoch): 1h 58m with 8x A100 80G

Citation

BibTeX:

@misc{kweon2023publicly,
    title={Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes},
    author={Sunjun Kweon and Junu Kim and Jiyoun Kim and Sujeong Im and Eunbyeol Cho and Seongsu Bae and Jungwoo Oh and Gyubok Lee and Jong Hak Moon and Seng Chan You and Seungjin Baek and Chang Hoon Han and Yoon Bin Jung and Yohan Jo and Edward Choi},
    year={2023},
    eprint={2309.00237},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}