--- license: apache-2.0 language: - zh - en base_model: - meta-llama/Llama-3.1-8B-Instruct pipeline_tag: feature-extraction tags: - structuring - EHR - medical - IE --- # Model Card for GENIE ## Model Details Model Size: 8B (English) / 7B (Chinese) Max Tokens: 8192 Base model: Llama 3.1 8B (English) / Qwen 2.5 7B (Chinese) ### Model Description GENIE (Generative Note Information Extraction) is an end-to-end model for structuring EHR data, which is completed by cooperation between Sheng Yu's group (https://www.stat.tsinghua.edu.cn/teachers/shengyu/) and Tianxi Cai's group (https://dbmi.hms.harvard.edu/people/tianxi-cai). GENIE can process an entire paragraph of clinical notes in a single pass, outputting structured information on named entities, assertion statuses, locations, other relevant modifiers, clinical values, and intended purposes. This end-to-end approach simplifies the structuring process, reduces errors, and enables healthcare providers to derive structured data from EHRs more efficiently, without the need for extensive manual adjustments. And experiments have shown that GENIE achieves high accuracy in each of the task. ## Usage ```python from vllm import LLM, SamplingParams model = LLM(model='THUMedInfo/GENIE_en_8b', tensor_parallel_size=1) #model = LLM(model=path/to/your/local/model, tensor_parallel_size=1) PROMPT_TEMPLATE = "Human:\n{query}\n\n Assistant:" sampling_params = SamplingParams(temperature=temperature, max_tokens=max_new_token) EHR = ['xxxxx1','xxxxx2'] texts = [PROMPT_TEMPLATE.format(query=k) for k in EHR] output = model.generate(texts, sampling_params) res = json.loads(output[0].outputs[0].text) ``` ## Citation [optional] If you find our paper or models helpful, please consider cite: (to be released) **BibTeX:** [More Information Needed]