Transformers
English
medical
health
llama
llama2
Inference Endpoints
jiminHuangYale commited on
Commit
b2ac8b8
1 Parent(s): cc703a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -3
README.md CHANGED
@@ -1,3 +1,76 @@
1
- ---
2
- license: llama2
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama2
3
+ datasets:
4
+ - togethercomputer/RedPajama-Data-1T
5
+ - bigbio/med_qa
6
+ language:
7
+ - en
8
+ library_name: transformers
9
+ tags:
10
+ - medical
11
+ - health
12
+ - llama
13
+ - llama2
14
+ ---
15
+
16
+ # Me-LLaMA
17
+
18
+ ## Model Overview
19
+
20
+ The Me-LLaMA model consists of two foundation models: Me-LLaMA 13B and Me-LLaMA 70B, along with their chat-enhanced counterparts, Me-LLaMA 13B-chat and Me-LLaMA 70B-chat. These models are designed for superior chat and instruction-following capabilities. The Me-LLaMA 13B and 70B were continually pretrained from the base LLaMA 2 13B and 70B models with the addition of biomedical, clinical, and general domain data. The chat versions were further instruction-tuned using comprehensive medical instruction tuning data.
21
+
22
+ ## Pretraining and Data
23
+
24
+ Me-LLaMA was developed through continual pre-training and instruction tuning of LLaMA2, incorporating 129B tokens and 214K instruction tuning samples from general, biomedical, and clinical domains. The pretraining data consists of biomedical literature, clinical notes, and general domain data in a 15:1:4 ratio, sourced from:
25
+
26
+ - **Biomedical:** PubMed Central and PubMed Abstracts (Pile dataset)
27
+ - **Clinical:** De-identified free-text clinical notes from MIMIC III, MIMIC-IV, and MIMIC-CXR
28
+ - **General Domain:** Subset from the RedPajama dataset
29
+
30
+ The instruction tuning dataset includes:
31
+
32
+ - **General Domain:** Alpaca, Dolly, and ShareGPT datasets
33
+ - **Biomedical:** HealthCareMagic, Icliniq, MedInstruct, Medical Flash Cards, MEDIQA, MedicationQA, LiveQA, WikiDocPatient, Guideline QA, Pubmed Central, Pubmed, UMLS Knowledge graph
34
+ - **Clinical:** MIMIC-III and MIMIC-IV
35
+
36
+ ## Evaluation
37
+
38
+ Me-LLaMA was evaluated on 12 datasets across different tasks:
39
+
40
+ - **QA:** PubMedQA, MedQA, MedMCQA, EmrQA
41
+ - **NER:** 2010 i2b2
42
+ - **Relation Extraction:** 2013 DDI
43
+ - **Classification:** HoC, MTSample
44
+ - **Text Summarization:** PubMed, MIMIC-CXR
45
+ - **NLI:** BioNLI, MedNLI
46
+
47
+ ### Performance
48
+
49
+ - **Me-LLaMA 13B:** Surpassed PMC-LLaMA 13B on 11/12 datasets and LLaMA2 13B on 10/12 datasets, with competitive performance against larger models like LLaMA2 70B and Meditron 70B on 8/12 datasets.
50
+ - **Me-LLaMA 70B:** Outperformed LLaMA2 70B and Meditron 70B on 9/12 datasets.
51
+ - **Zero-shot setting:** Outperformed ChatGPT on 5/8 datasets without privacy concerns, and on 1/8 against GPT-4.
52
+ - **Task-specific instruction tuning:** Surpassed ChatGPT on 7/8 and GPT-4 on 5/8 datasets.
53
+
54
+ Despite having significantly fewer parameters (13B/70B vs. 175B+ for ChatGPT and GPT-4), Me-LLaMA models demonstrated impressive performance and strong abilities in supervised and in-context learning across various medical tasks.
55
+
56
+ ## Model Details
57
+
58
+ Included in this repository are four models:
59
+
60
+ 1. **Me-LLaMA 13B:** Continually pretrained from LLaMA 2 13B.
61
+ 2. **Me-LLaMA 70B:** Continually pretrained from LLaMA 2 70B.
62
+ 3. **Me-LLaMA 13B-chat:** Further instruction-tuned from Me-LLaMA 13B using a variety of general, biomedical, and clinical datasets.
63
+ 4. **Me-LLaMA 70B-chat:** Further instruction-tuned from Me-LLaMA 70B using a variety of general, biomedical, and clinical datasets.
64
+
65
+ Each model contains several files, which are standard with the transformers library:
66
+
67
+ - **config.json:** Information about the model
68
+ - **model-x-of-y.safetensors:** Model weights
69
+ - **generation_config.json:** Settings for text generation
70
+ - **special_tokens_map.json:** Special tokens used in training
71
+ - **tokenizer.json:** Mapping from indices to tokens
72
+ - **tokenizer_config.json:** Configuration file for the tokenizer
73
+
74
+ ## Usage
75
+
76
+ For more details and to access the models, please visit the [Me-LLaMA repository on PhysioNet](https://physionet.org/content/me-llama/1.0.0/).