JordiBayarri commited on
Commit
4ea2be3
1 Parent(s): a4e5fc8

Upload 3 files

Browse files
Files changed (3) hide show
  1. LLAMA_3.1_COMMUNITY_LICENSE_AGREEMENT +32 -0
  2. Notice +1 -0
  3. README.md +365 -0
LLAMA_3.1_COMMUNITY_LICENSE_AGREEMENT ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ LLAMA 3.1 COMMUNITY LICENSE AGREEMENT
2
+
3
+ Llama 3.1 Version Release Date: July 23, 2024
4
+ “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
5
+ “Documentation” means the specifications, manuals and documentation accompanying Llama 3.1 distributed by Meta at https://llama.com/docs/overview.
6
+ “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
7
+ “Llama 3.1” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at https://llama.com/llama-downloads.
8
+ “Llama Materials” means, collectively, Meta’s proprietary Llama 3.1 and Documentation (and any portion thereof) made available under this Agreement.
9
+ “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
10
+
11
+ By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.
12
+ 1. License Rights and Redistribution.
13
+ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.
14
+ b. Redistribution and Use.
15
+
16
+ i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name.
17
+
18
+ ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
19
+
20
+ iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”
21
+ iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at https://llama.com/llama3_1/use-policy), which is hereby incorporated by reference into this Agreement.
22
+ 2. Additional Commercial Terms. If, on the Llama 3.1 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
23
+ 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
24
+ 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
25
+ 5. Intellectual Property.
26
+ a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at https://about.meta.com/brand/resources/meta/company-brand/). All goodwill arising out of your use of the Mark will inure to the benefit of Meta.
27
+
28
+ b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.
29
+
30
+ c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.
31
+ 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
32
+ 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
Notice ADDED
@@ -0,0 +1 @@
 
 
1
+ Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
README.md ADDED
@@ -0,0 +1,365 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ datasets:
4
+ - HPAI-BSC/Aloe-Beta-General-Collection
5
+ - HPAI-BSC/chain-of-diagnosis
6
+ - HPAI-BSC/MedS-Ins
7
+ - HPAI-BSC/ultramedical
8
+ - HPAI-BSC/pubmedqa-cot-llama31
9
+ - HPAI-BSC/medqa-cot-llama31
10
+ - HPAI-BSC/medmcqa-cot-llama31
11
+ - HPAI-BSC/headqa-cot-llama31
12
+ - HPAI-BSC/MMLU-medical-cot-llama31
13
+ - HPAI-BSC/Polymed-QA
14
+ - HPAI-BSC/Aloe-Beta-General-Collection
15
+ - HPAI-BSC/Aloe-Beta-General-Collection
16
+ language:
17
+ - en
18
+ library_name: transformers
19
+ tags:
20
+ - biology
21
+ - medical
22
+ - healthcare
23
+ pipeline_tag: question-answering
24
+ ---
25
+ <p align="center">
26
+ <picture>
27
+ <source media="(prefers-color-scheme: dark)" srcset="https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/ymLVs-prfhvlT7RF2NnCW.png">
28
+ <img alt="prompt_engine" src="https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/ymLVs-prfhvlT7RF2NnCW.png" width=50%>
29
+ </picture>
30
+ </p>
31
+ <h1 align="center">
32
+ Aloe: A Family of Fine-tuned Open Healthcare LLMs
33
+ </h1>
34
+
35
+ ---
36
+
37
+ Llama3.1-Aloe-8B-Beta is an **open healthcare LLM** (released with a permissive CC-BY license) achieving **state-of-the-art performance** on several medical tasks. Aloe Beta is made available in two model sizes: [8B](https://huggingface.co/HPAI-BSC/Llama31-Aloe-Beta-8B) and [70B](https://huggingface.co/HPAI-BSC/Llama31-Aloe-Beta-70B). Both models are trained using the same recipe. All necessary resources and details are made available below.
38
+
39
+ Aloe is trained in 10 med9cal tasks, resulting in a robust and versatile healthcare model. Evaluations show Aloe models to be among the best in their class. When combined with a RAG system ([also released](https://github.com/HPAI-BSC/prompt_engine)) the 8B version gets close to the performance of closed models like MedPalm-2, GPT4 and Medprompt. With the same RAG system, Aloe-Beta-70B outperforms those private alternatives, producing state-of-the-art results.
40
+
41
+ # Aloe-8B-Beta
42
+
43
+
44
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/aKgsCiRRGInnY1WS4ObXl.png)
45
+
46
+ Aloe-8B-Beta is the latest iteration in the Aloe family, building and improving on the success of its predecessor, [Aloe-8B-Alpha](https://huggingface.co/HPAI-BSC/Llama3-Aloe-8B-Alpha).
47
+ Beta more than triples the training data used by Alpha, for a total of 1.8B tokens, including a wider variety of medical tasks and instructions (e.g., text summarization, explanation, diagnosis, text classification, treatment recommendation, ...).
48
+
49
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/bCuV5kZUT9H9UECAOWDRc.png)
50
+
51
+ Beta also boosts the alignment and safety stages with respect to Alpha. This includes a [medical preference dataset](https://huggingface.co/datasets/TsinghuaC3I/UltraMedical-Preference), as well as the red-teaming dataset (available soon).
52
+
53
+ Complete training details, model merging configurations, and all training data (including synthetically generated data) can be found below. This includes [the RAG system](https://github.com/HPAI-BSC/prompt_engine) that was developed to test Aloe Beta in a deployment setup. Aloe comes with a healthcare-specific risk assessment to facilitate to the safe use and deployment of such systems.
54
+
55
+
56
+ ## Model Details
57
+
58
+ ### [](https://huggingface.co/templates/model-card-example#model-description)Model Description
59
+
60
+ - **Developed by:** [HPAI](https://hpai.bsc.es/)
61
+ - **Model type:** Causal decoder-only transformer language model
62
+ - **Language(s) (NLP):** English (capable but not formally evaluated on other languages)
63
+ - **License:** This model is based on Meta Llama 3.1 8B and is governed by the [Meta Llama 3 License](https://www.llama.com/llama3_1/license/). All our modifications are available with a [CC BY 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license, making the Aloe Beta models **compatible with commercial use**.
64
+ - **Base model :** [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)
65
+ - **Paper:** (more coming soon)
66
+ - **RAG Repository:** https://github.com/HPAI-BSC/prompt_engine
67
+
68
+ ### [](https://huggingface.co/templates/model-card-example#model-sources-optional)Model Sources [optional]
69
+
70
+ ## Model Performance
71
+
72
+ Aloe Beta has been tested on the most popular healthcare QA datasets, with and without Medprompt inference technique. Results show competitive performance, achieving SOTA within models of the same size.
73
+
74
+
75
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/j4KJOdnejJuNSsEm_dcuV.png)
76
+
77
+ The Beta model has been developed to excel in several different medical tasks. For this reason, we evaluated the model in many different medical tasks:
78
+
79
+
80
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/ZABYUxpQRMDcrJmKhkEfz.png)
81
+
82
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/2NW3im0aH2u6RKp969sjx.png)
83
+
84
+ We also compared the performance of the model in the general domain, using the OpenLLM Leaderboard benchmark:
85
+
86
+ TO BE UPDATED
87
+
88
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/Ym6v3LsMdfwetXbg6twQP.png)
89
+
90
+ ## Uses
91
+
92
+ ### Direct Use
93
+
94
+ We encourage the use of Aloe for research purposes, as a stepping stone to build better foundational models for healthcare. In production, Aloe should always be used under the supervision of a human expert.
95
+
96
+ ### Out-of-Scope Use
97
+
98
+ These models are not to be used for clinical practice, medical diagnosis, or any other form of direct or indirect healthcare advice. Models are prone to error and can produce toxic content. The use of Aloe models for activities harmful to individuals, such as spam, fraud, or impersonation, is strictly prohibited. Minors should not be left alone to interact with Aloe without supervision.
99
+
100
+ ## Bias, Risks, and Limitations
101
+
102
+ Aloe can produce toxic content under the appropriate prompts, and it includes multiple undesirable biases. While significant efforts where conducted to mitigate this (see Alignment details below), model safety cannot be fully guaranteed. We avoid the use of all personal data in our training.
103
+
104
+ We identify at least three risk cases specific of healthcare LLMs:
105
+ - Healthcare professional impersonation, a fraudulent behaviour which currently generates billions of dollars in [profit](https://www.justice.gov/opa/pr/justice-department-charges-dozens-12-billion-health-care-fraud). A model such as Aloe could be used to increase the efficacy of such deceiving activities, making them more widespread. The main preventive actions are public literacy on the unreliability of digitised information and the importance of medical registration, and legislation enforcing AI-generated content disclaimers.
106
+ - Medical decision-making without professional supervision. While this is already an issue in modern societies (eg self-medication) a model such as Aloe, capable of producing high-quality conversational data, can facilitate self-delusion, particularly in the presence of sycophancy. By producing tailored responses, it can also be used to generate actionable answers. Public literacy on the dangers of self-diagnosis is one of the main defenses, together with the introduction of disclaimers and warnings on the models' outputs.
107
+ - Access to information on dangerous substances or procedures. While the literature on sensitive content can already be found on different sources (eg libraries, the internet, dark web), LLMs can centralize such access, making it nearly impossible to control the flow of such information. Model alignment can help in that regard, but so far the effects remain insufficient, as jailbreaking methods still overcome it.
108
+
109
+
110
+ <!---
111
+ Table below shows the performance of Aloe at several AI safety tasks:
112
+
113
+ TO BE UPDATED
114
+
115
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/62972c4979f193515da1d38e/T6Jblpf1kmTkM04K716rM.png" width="95%">
116
+
117
+
118
+ We analyzed the safety and robustness of the model using red teaming techniques. We designed a benchmark using different types of attacks and analyzed the performance of Aloe and some extra models, and we confirm that our model is aligned properly and successfully resisting most attacks:
119
+
120
+
121
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/KS3yrHan1l1W0cYiXGG-G.png)
122
+
123
+
124
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/SYC0qljpLGLmMgx0a623W.png)
125
+
126
+ -->
127
+
128
+ ## How to Get Started with the Model
129
+
130
+ Use the code below to get started with the model. You can run conversational inference using the Transformers pipeline abstraction, or by leveraging the Auto classes with the `generate()` function. Let's see examples for both.
131
+
132
+ #### Transformers pipeline
133
+
134
+ ```python
135
+ import transformers
136
+ import torch
137
+
138
+ model_id = "HPAI-BSC/Llama3-Aloe-8B-Beta"
139
+
140
+ pipeline = transformers.pipeline(
141
+ "text-generation",
142
+ model=model_id,
143
+ model_kwargs={"torch_dtype": torch.bfloat16},
144
+ device_map="auto",
145
+ )
146
+
147
+ messages = [
148
+ {"role": "system", "content": "You are an expert medical assistant named Aloe, developed by the High Performance Artificial Intelligence Group at Barcelona Supercomputing Center(BSC). You are to be a helpful, respectful, and honest assistant."},
149
+ {"role": "user", "content": "Hello."},
150
+ ]
151
+
152
+ prompt = pipeline.tokenizer.apply_chat_template(
153
+ messages,
154
+ tokenize=False,
155
+ add_generation_prompt=True
156
+ )
157
+
158
+ terminators = [
159
+ pipeline.tokenizer.eos_token_id,
160
+ pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
161
+ ]
162
+
163
+ outputs = pipeline(
164
+ prompt,
165
+ max_new_tokens=256,
166
+ eos_token_id=terminators,
167
+ do_sample=True,
168
+ temperature=0.6,
169
+ top_p=0.9,
170
+ )
171
+ print(outputs[0]["generated_text"][len(prompt):])
172
+ ```
173
+
174
+ #### Transformers AutoModelForCausalLM
175
+
176
+ ```python
177
+ from transformers import AutoTokenizer, AutoModelForCausalLM
178
+ import torch
179
+
180
+ model_id = "HPAI-BSC/Llama31-Aloe-Beta-8B"
181
+
182
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
183
+ model = AutoModelForCausalLM.from_pretrained(
184
+ model_id,
185
+ torch_dtype=torch.bfloat16,
186
+ device_map="auto",
187
+ )
188
+
189
+ messages = [
190
+ {"role": "system", "content": "You are an expert medical assistant named Aloe, developed by the High Performance Artificial Intelligence Group at Barcelona Supercomputing Center(BSC). You are to be a helpful, respectful, and honest assistant."},
191
+ {"role": "user", "content": "Hello"},
192
+ ]
193
+
194
+ input_ids = tokenizer.apply_chat_template(
195
+ messages,
196
+ add_generation_prompt=True,
197
+ return_tensors="pt"
198
+ ).to(model.device)
199
+
200
+ terminators = [
201
+ tokenizer.eos_token_id,
202
+ tokenizer.convert_tokens_to_ids("<|eot_id|>")
203
+ ]
204
+
205
+ outputs = model.generate(
206
+ input_ids,
207
+ max_new_tokens=256,
208
+ eos_token_id=terminators,
209
+ do_sample=True,
210
+ temperature=0.6,
211
+ top_p=0.9,
212
+ )
213
+ response = outputs[0][input_ids.shape[-1]:]
214
+ print(tokenizer.decode(response, skip_special_tokens=True))
215
+ ```
216
+
217
+ ## Training Details
218
+
219
+ ### Supervised fine-tuning
220
+ SFT on top of Llama 3.1 using axolotl (https://github.com/axolotl-ai-cloud/axolotl).
221
+
222
+ We used Deepspeed's Zero-3 distributed training using the following hardware:
223
+
224
+ * 8B: 32x NVIDIA Hopper H100 64GB of the *Marenostrum 5*.
225
+ * 70B: 64x NVIDIA Hopper H100 64GB of the *Marenostrum 5*.
226
+
227
+
228
+ <!---
229
+ ^^^ TO BE COMPLETED AND DETAILED ^^^
230
+ -->
231
+
232
+
233
+
234
+ #### Training Data
235
+
236
+ The training set consists of around 1.8B tokens, having 3 different types of data:
237
+
238
+ - Medical domain datasets:
239
+ - [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
240
+ - [HPAI-BSC/chain-of-diagnosis](https://huggingface.co/datasets/HPAI-BSC/chain-of-diagnosis)
241
+ - [HPAI-BSC/MedS-Ins](https://huggingface.co/datasets/HPAI-BSC/MedS-Ins)
242
+ - [HPAI-BSC/ultramedica](https://huggingface.co/datasets/HPAI-BSC/ultramedical)
243
+ - Synthetic data generated using Llama3.1:
244
+ - [HPAI-BSC/pubmedqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/pubmedqa-cot-llama31)
245
+ - [HPAI-BSC/medqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medqa-cot-llama31)
246
+ - [HPAI-BSC/medmcqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medmcqa-cot-llama31)
247
+ - [HPAI-BSC/headqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/headqa-cot-llama31)
248
+ - [HPAI-BSC/MMLU-medical-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/MMLU-medical-cot-llama31)
249
+ - [HPAI-BSC/Polymed-QA](https://huggingface.co/datasets/HPAI-BSC/Polymed-QA)
250
+ - Genstruct data (coming soon)
251
+ - General data:
252
+ - [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
253
+
254
+ #### Training parameters
255
+ - Epochs: 3
256
+ - Sequence length: 16384
257
+ - Optimizer: adamw_torch
258
+ - Learning rate: 2e-5
259
+ - Learning rate scheduler: cosine
260
+ - Warmup steps: 100
261
+ - Weight decay: 0
262
+ - Gradient checkpointing
263
+ - Zero 3
264
+ - Total batch size: 128
265
+ - Batch size per device: 1
266
+ - Gradient accumulation steps: 4
267
+
268
+ ### Model Merging
269
+ The model trained was merged with the Llama-3.1-Instruct model using the DARE_TIES technique. [Mergekit](https://github.com/arcee-ai/mergekit) was used to conduct the merging.
270
+
271
+ ### Model Alignment
272
+ The model is aligned using the Direct Preference Optimization (DPO) technique through a two-step process:
273
+
274
+ 1. General DPO Alignment: This step uses a dataset combining medical, general preference, and safety data. We used our dataset [HPAI-BSC/Aloe-Beta-DPO](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-DPO). We split the dataset into five parts, and the model was trained iteratively for one epoch on each chunk. We used a learning rate of 2e-7.
275
+ 2. Red-Teaming Alignment: This step further fine-tunes the model to resist a variety of potential attacks, enhancing its robustness and security. Dataset will be shared soon. In this stage, we set the learning rate to 1e-7.
276
+
277
+ <!---
278
+ ^^^ LINKS TO DPO DATA ^^^
279
+ -->
280
+
281
+
282
+ We used [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) library. We aligned the model using 16x NVIDA HOOPER H100 64GB of the *Marenostrum 5*. Common hyperparameters:
283
+
284
+ - Sequence length: 4096
285
+ - Optimizer: Fused adam
286
+ - Total batch size 128
287
+ - Batch size per device: 1
288
+ - Gradient accumulation steps: 8
289
+ - Beta: 0.1
290
+
291
+
292
+
293
+ ## Evaluation
294
+
295
+ ### Testing Data, Factors & Metrics
296
+
297
+ #### Testing Data
298
+
299
+ - [MedQA (USMLE)](https://huggingface.co/datasets/bigbio/med_qa)
300
+ - [MedMCQA](https://huggingface.co/datasets/medmcqa)
301
+ - [PubMedQA](https://huggingface.co/datasets/bigbio/pubmed_qa)
302
+ - [MMLU-Medical](https://huggingface.co/datasets/lukaemon/mmlu)
303
+ - [MedQA-4-Option](https://huggingface.co/datasets/GBaker/MedQA-USMLE-4-options)
304
+ - [CareQA](https://huggingface.co/datasets/HPAI-BSC/CareQA)
305
+ - [Open LLM Leaderboard 2](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
306
+
307
+ <!---
308
+ ^^^ MORE EVALS MISSING ^^^
309
+ -->
310
+
311
+ #### Metrics
312
+
313
+ - Accuracy: suite the evaluation of multiple-choice question-answering tasks.
314
+
315
+
316
+ <!---
317
+ ^^^ MORE METRICS MISSING ^^^
318
+ -->
319
+
320
+ #### Summary
321
+
322
+ To compare Aloe with the most competitive open models (both general purpose and healthcare-specific) we use popular healthcare datasets (PubMedQA, MedMCQA, MedQA and MMLU for six medical tasks only), together with the new and highly reliable CareQA. We produce the standard MultiMedQA score for reference, by computing the weighted average accuracy on all scores except CareQA. Additionally, we calculate the arithmetic mean across all datasets. The Medical MMLU is calculated by averaging the six medical subtasks: Anatomy, Clinical knowledge, College Biology, College medicine, Medical genetics, and Professional medicine.
323
+
324
+ Benchmark results indicate the training conducted on Aloe has boosted its performance above Llama3-8B-Instruct. Llama3-Aloe-8B-Alpha outperforms larger models like Meditron 70B, and is close to larger base models, like Yi-34. For the former, this gain is consistent even when using SC-CoT, using their best-reported variant. All these results make Llama3-Aloe-8B-Alpha the best healthcare LLM of its size.
325
+
326
+ With the help of prompting techniques the performance of Llama3-Aloe-8B-Alpha is significantly improved. Medprompting in particular provides a 7% increase in reported accuracy, after which Llama3-Aloe-8B-Alpha only lags behind the ten times bigger Llama-3-70B-Instruct. This improvement is mostly consistent across medical fields. Llama3-Aloe-8B-Alpha with medprompting beats the performance of Meditron 70B with their self reported 20 shot SC-CoT in MMLU med and is slightly worse in the other benchmarks.
327
+
328
+ ## Environmental Impact
329
+
330
+ - **Hardware Type:** 32xH100
331
+ - **Hours used (8B):** 544 GPU hours
332
+ - **Hours used (70B):** 4500 GPU hours
333
+ - **Hardware Provider:** Barcelona Supercomputing Center (BSC)
334
+ - **Compute Region:** Spain
335
+ - **Carbon Emitted:** 34.1 kg of CO2
336
+
337
+ <!---
338
+ ^^^ ARE CARBON EMISSIONS FOR BOTH? ^^^
339
+ -->
340
+
341
+
342
+ ## Authors
343
+ Aloe Beta has been developed by the [High Performance Artificial Intelligence](https://hpai.bsc.es/) research group, from the [Barcelona Supercomping Center - BSC](https://www.bsc.es/). Main authors are [Jordi Bayarri Planas](https://huggingface.co/JordiBayarri), Ashwin Kumar Gururajan and [Dario Garcia-Gasulla](https://huggingface.co/dariog). Red teaming efforts lead by Adrian Tormos.
344
+
345
+ mailto:hpai@bsc.es
346
+
347
+ ## Citations
348
+
349
+
350
+ <!---
351
+ Add the prompt engine paper below
352
+ -->
353
+
354
+ If you use this repository in a published work, please cite the corresponding papers as source:
355
+
356
+ ```
357
+ @misc{gururajan2024aloe,
358
+ title={Aloe: A Family of Fine-tuned Open Healthcare LLMs},
359
+ author={Ashwin Kumar Gururajan and Enrique Lopez-Cuena and Jordi Bayarri-Planas and Adrian Tormos and Daniel Hinjos and Pablo Bernabeu-Perez and Anna Arias-Duart and Pablo Agustin Martin-Torres and Lucia Urcelay-Ganzabal and Marta Gonzalez-Mallo and Sergio Alvarez-Napagao and Eduard Ayguadé-Parra and Ulises Cortés Dario Garcia-Gasulla},
360
+ year={2024},
361
+ eprint={2405.01886},
362
+ archivePrefix={arXiv},
363
+ primaryClass={cs.CL}
364
+ }
365
+ ```