Xidong commited on
Commit
05163c8
1 Parent(s): 34c4980

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +153 -0
README.md CHANGED
@@ -1,3 +1,156 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # Multilingual Medicine: Model, Dataset, Benchmark, Code
5
+
6
+ Covering English, Chinese, French, Hindi, Spanish, Hindi, Arabic So far
7
+
8
+
9
+ <p align="center">
10
+ 👨🏻‍💻<a href="https://github.com/FreedomIntelligence/Apollo" target="_blank">Github</a> •📃 <a href="https://arxiv.org/abs/2403.03640" target="_blank">Paper</a> • 🌐 <a href="https://apollo.llmzoo.com/" target="_blank">Demo</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
11
+ <br> <a href="./README_zh.md"> 中文 </a> | <a href="./README.md"> English
12
+ </p>
13
+
14
+ ![Apollo](assets/apollo_medium_final.png)
15
+
16
+ ## 🌈 Update
17
+
18
+ * **[2024.02.12]** <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> and <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a> is published!🎉
19
+ * **[2024.01.23]** Apollo repo is published!🎉
20
+
21
+
22
+ ## Results
23
+ <a href="https://huggingface.co/FreedomIntelligence/Apollo-0.5B" target="_blank">Apollo-0.5B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-1.8B" target="_blank">Apollo-1.8B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-2B" target="_blank">Apollo-2B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-6B" target="_blank">Apollo-6B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-7B" target="_blank">Apollo-7B</a>
24
+
25
+
26
+ ![Apollo](assets/result.png)
27
+
28
+
29
+
30
+
31
+
32
+ ## Dataset & Evaluation
33
+
34
+ - Dataset
35
+ 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a>
36
+
37
+ <details><summary>Click to expand</summary>
38
+
39
+ ![Apollo](assets/dataset.png)
40
+
41
+ - [Zip File](https://huggingface.co/datasets/FreedomIntelligence/Medbase_data/blob/main/Medbase_data-datasets.zip)
42
+ - [Data category](https://huggingface.co/datasets/FreedomIntelligence/Medbase_data/tree/main/train)
43
+ - Pretrain:
44
+ - data item:
45
+ - json_name: {data_source}_{language}_{data_type}.json
46
+ - data_type: medicalBook, medicalGuideline, medicalPaper, medicalWeb(from online forum), medicalWiki
47
+ - language: en(English), zh(chinese), es(spanish), fr(french), hi(Hindi)
48
+ - data_type: qa(generated qa from text)
49
+ - data_type==text: list of string
50
+ ```
51
+ [
52
+ "string1",
53
+ "string2",
54
+ ...
55
+ ]
56
+ ```
57
+ - data_type==qa: list of qa pairs(list of string)
58
+ ```
59
+ [
60
+ [
61
+ "q1",
62
+ "a1",
63
+ "q2",
64
+ "a2",
65
+ ...
66
+ ],
67
+ ...
68
+ ]
69
+ ```
70
+ - SFT:
71
+ - json_name: {data_source}_{language}.json
72
+ - data_type: code, general, math, medicalExam, medicalPatient
73
+ - data item: list of qa pairs(list of string)
74
+ ```
75
+ [
76
+ [
77
+ "q1",
78
+ "a1",
79
+ "q2",
80
+ "a2",
81
+ ...
82
+ ],
83
+ ...
84
+ ]
85
+ ```
86
+
87
+
88
+ </details>
89
+
90
+
91
+
92
+ - Evaluation
93
+ 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
94
+
95
+ <details><summary>Click to expand</summary>
96
+
97
+ - EN:
98
+ - [MedQA-USMLE](https://huggingface.co/datasets/GBaker/MedQA-USMLE-4-options)
99
+ - [MedMCQA](https://huggingface.co/datasets/medmcqa/viewer/default/test)
100
+ - [PubMedQA](https://huggingface.co/datasets/pubmed_qa): Because the results fluctuated too much, they were not used in the paper.
101
+ - [MMLU-Medical](https://huggingface.co/datasets/cais/mmlu)
102
+ - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
103
+ - ZH:
104
+ - [MedQA-MCMLE](https://huggingface.co/datasets/bigbio/med_qa/viewer/med_qa_zh_4options_bigbio_qa/test)
105
+ - [CMB-single](https://huggingface.co/datasets/FreedomIntelligence/CMB): Not used in the paper
106
+ - Randomly sample 2,000 multiple-choice questions with single answer.
107
+ - [CMMLU-Medical](https://huggingface.co/datasets/haonan-li/cmmlu)
108
+ - Anatomy, Clinical_knowledge, College_medicine, Genetics, Nutrition, Traditional_chinese_medicine, Virology
109
+ - [CExam](https://github.com/williamliujl/CMExam): Not used in the paper
110
+ - Randomly sample 2,000 multiple-choice questions
111
+
112
+
113
+ - ES: [Head_qa](https://huggingface.co/datasets/head_qa)
114
+ - FR: [Frenchmedmcqa](https://github.com/qanastek/FrenchMedMCQA)
115
+ - HI: [MMLU_HI](https://huggingface.co/datasets/FreedomIntelligence/MMLU_Arabic)
116
+ - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
117
+ - AR: [MMLU_Ara](https://huggingface.co/datasets/FreedomIntelligence/MMLU_Hindi)
118
+ - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
119
+
120
+
121
+ </details>
122
+
123
+
124
+ ## Results reproduction
125
+ <details><summary>Click to expand</summary>
126
+
127
+ **Waiting for Update**
128
+
129
+
130
+
131
+ </details>
132
+
133
+
134
+
135
+
136
+ ## Citation
137
+ Please use the following citation if you intend to use our dataset for training or evaluation:
138
+
139
+ ```
140
+ @misc{wang2024apollo,
141
+ title={Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People},
142
+ author={Xidong Wang and Nuo Chen and Junyin Chen and Yan Hu and Yidong Wang and Xiangbo Wu and Anningzhe Gao and Xiang Wan and Haizhou Li and Benyou Wang},
143
+ year={2024},
144
+ eprint={2403.03640},
145
+ archivePrefix={arXiv},
146
+ primaryClass={cs.CL}
147
+ }
148
+ @misc{Apollo,
149
+ title={Apollo, Multilingual Medicine: Model, Dataset, Benchmark, Code},
150
+ author={Xidong Wang, Junyin Chen, Nuo Chen, Yidong Wang, Zhiyi Zhang, Benyou Wang},
151
+ year = {2024},
152
+ publisher = {GitHub},
153
+ journal = {GitHub repository},
154
+ howpublished = {\url{https://github.com/FreedomIntelligence/Apollo}},
155
+ }
156
+ ```