license: apache-2.0
language:
- zh
- en
base_model:
- meta-llama/Llama-3.1-8B-Instruct
pipeline_tag: feature-extraction
tags:
- structuring
- EHR
- medical
- IE
Model Card for GENIE
Model Details
Model Size: 8B (English) / 7B (Chinese)
Max Tokens: 8192
Base model: Llama 3.1 8B (English) / Qwen 2.5 7B (Chinese)
Model Description
GENIE (Generative Note Information Extraction) is an end-to-end model for structuring EHR data, which is completed by cooperation between Sheng Yu's group (https://www.stat.tsinghua.edu.cn/teachers/shengyu/) and Tianxi Cai's group (https://dbmi.hms.harvard.edu/people/tianxi-cai). GENIE can process an entire paragraph of clinical notes in a single pass, outputting structured information on named entities, assertion statuses, locations, other relevant modifiers, clinical values, and intended purposes. This end-to-end approach simplifies the structuring process, reduces errors, and enables healthcare providers to derive structured data from EHRs more efficiently, without the need for extensive manual adjustments. And experiments have shown that GENIE achieves high accuracy in each of the task.
Usage
from vllm import LLM, SamplingParams
model = LLM(model='THUMedInfo/GENIE_en_8b', tensor_parallel_size=1)
#model = LLM(model=path/to/your/local/model, tensor_parallel_size=1)
PROMPT_TEMPLATE = "Human:\n{query}\n\n Assistant:"
sampling_params = SamplingParams(temperature=temperature, max_tokens=max_new_token)
EHR = ['xxxxx1','xxxxx2']
texts = [PROMPT_TEMPLATE.format(query=k) for k in EHR]
output = model.generate(texts, sampling_params)
res = json.loads(output[0].outputs[0].text)
Citation [optional]
If you find our paper or models helpful, please consider cite: (to be released)
BibTeX:
[More Information Needed]