metadata

license: apache-2.0
language:
  - zh
  - en
base_model:
  - meta-llama/Llama-3.1-8B-Instruct
pipeline_tag: feature-extraction
tags:
  - structuring
  - EHR
  - medical
  - IE

Model Card for GENIE

Model Details

Model Size: 8B (English) / 7B (Chinese)

Max Tokens: 8192

Base model: Llama 3.1 8B (English) / Qwen 2.5 7B (Chinese)

Model Description

GENIE (Generative Note Information Extraction) is an end-to-end model for structuring EHR data, which is completed by cooperation between Sheng Yu's group (https://www.stat.tsinghua.edu.cn/teachers/shengyu/) and Tianxi Cai's group (https://dbmi.hms.harvard.edu/people/tianxi-cai). GENIE can process an entire paragraph of clinical notes in a single pass, outputting structured information on named entities, assertion statuses, locations, other relevant modifiers, clinical values, and intended purposes. This end-to-end approach simplifies the structuring process, reduces errors, and enables healthcare providers to derive structured data from EHRs more efficiently, without the need for extensive manual adjustments. And experiments have shown that GENIE achieves high accuracy in each of the task.

Usage

from vllm import LLM, SamplingParams

model = LLM(model='THUMedInfo/GENIE_en_8b', tensor_parallel_size=1)
#model = LLM(model=path/to/your/local/model, tensor_parallel_size=1)

PROMPT_TEMPLATE = "Human:\n{query}\n\n Assistant:"
sampling_params = SamplingParams(temperature=temperature, max_tokens=max_new_token)
EHR = ['xxxxx1','xxxxx2']
texts = [PROMPT_TEMPLATE.format(query=k) for k in EHR]
output = model.generate(texts, sampling_params)
res = json.loads(output[0].outputs[0].text)

Citation [optional]

If you find our paper or models helpful, please consider cite: (to be released)

BibTeX:

[More Information Needed]