Tonic commited on
Commit
3701a4c
1 Parent(s): ec009a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +190 -112
README.md CHANGED
@@ -7,69 +7,52 @@ language:
7
  library_name: adapter-transformers
8
  ---
9
 
10
- # Model Card for {{ model_id | default("Model ID", true) }}
11
 
12
- <!-- Provide a quick summary of what the model is/does. -->
13
-
14
- {{ model_summary | default("", true) }}
15
 
16
  ## Model Details
17
 
18
  ### Model Description
19
 
20
- <!-- Provide a longer summary of what this model is. -->
21
-
22
- {{ model_description | default("", true) }}
23
-
24
- - **Developed by:** {{ developers | default("[More Information Needed]", true)}}
25
- - **Funded by [optional]:** {{ funded_by | default("[More Information Needed]", true)}}
26
- - **Shared by [optional]:** {{ shared_by | default("[More Information Needed]", true)}}
27
- - **Model type:** {{ model_type | default("[More Information Needed]", true)}}
28
- - **Language(s) (NLP):** {{ language | default("[More Information Needed]", true)}}
29
- - **License:** {{ license | default("[More Information Needed]", true)}}
30
- - **Finetuned from model [optional]:** {{ finetuned_from | default("[More Information Needed]", true)}}
31
 
32
  ### Model Sources [optional]
33
 
34
- <!-- Provide the basic links for the model. -->
35
-
36
- - **Repository:** {{ repo | default("[More Information Needed]", true)}}
37
- - **Paper [optional]:** {{ paper | default("[More Information Needed]", true)}}
38
  - **Demo [optional]:** {{ demo | default("[More Information Needed]", true)}}
39
 
40
  ## Uses
41
 
42
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
43
 
44
  ### Direct Use
45
 
46
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
47
-
48
- {{ direct_use | default("[More Information Needed]", true)}}
49
 
50
  ### Downstream Use [optional]
51
 
52
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
53
-
54
- {{ downstream_use | default("[More Information Needed]", true)}}
55
 
56
- ### Out-of-Scope Use
57
 
58
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
59
-
60
- {{ out_of_scope_use | default("[More Information Needed]", true)}}
61
-
62
- ## Bias, Risks, and Limitations
63
-
64
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
65
-
66
- {{ bias_risks_limitations | default("[More Information Needed]", true)}}
67
 
68
  ### Recommendations
69
 
70
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
 
 
71
 
72
- {{ bias_recommendations | default("Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.", true)}}
73
 
74
  ## How to Get Started with the Model
75
 
@@ -79,68 +62,50 @@ Use the code below to get started with the model.
79
 
80
  ## Training Details
81
 
82
- ### Training Data
83
-
84
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
85
-
86
- {{ training_data | default("[More Information Needed]", true)}}
87
-
88
- ### Training Procedure
89
-
90
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
91
-
92
- #### Preprocessing [optional]
93
-
94
- {{ preprocessing | default("[More Information Needed]", true)}}
95
-
96
-
97
- #### Training Hyperparameters
98
-
99
- - **Training regime:** {{ training_regime | default("[More Information Needed]", true)}} <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
100
-
101
- #### Speeds, Sizes, Times [optional]
102
-
103
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
104
 
105
- {{ speeds_sizes_times | default("[More Information Needed]", true)}}
106
 
107
- ## Evaluation
108
 
109
- <!-- This section describes the evaluation protocols and provides the results. -->
110
 
111
- ### Testing Data, Factors & Metrics
112
 
113
- #### Testing Data
114
 
115
- <!-- This should link to a Dataset Card if possible. -->
116
 
117
- {{ testing_data | default("[More Information Needed]", true)}}
118
 
119
- #### Factors
120
 
121
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
122
 
123
- {{ testing_factors | default("[More Information Needed]", true)}}
124
 
125
- #### Metrics
126
 
127
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
128
 
129
- {{ testing_metrics | default("[More Information Needed]", true)}}
 
 
130
 
131
  ### Results
132
 
133
- {{ results | default("[More Information Needed]", true)}}
 
134
 
135
- #### Summary
136
-
137
- {{ results_summary | default("", true) }}
138
-
139
- ## Model Examination [optional]
140
 
141
- <!-- Relevant interpretability work for the model goes here -->
142
-
143
- {{ model_examination | default("[More Information Needed]", true)}}
144
 
145
  ## Environmental Impact
146
 
@@ -154,50 +119,163 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
154
  - **Compute Region:** {{ cloud_region | default("[More Information Needed]", true)}}
155
  - **Carbon Emitted:** {{ co2_emitted | default("[More Information Needed]", true)}}
156
 
157
- ## Technical Specifications [optional]
158
 
159
  ### Model Architecture and Objective
160
 
161
- {{ model_specs | default("[More Information Needed]", true)}}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
162
 
163
  ### Compute Infrastructure
164
 
165
- {{ compute_infrastructure | default("[More Information Needed]", true)}}
166
-
167
  #### Hardware
168
 
169
- {{ hardware | default("[More Information Needed]", true)}}
170
 
171
  #### Software
172
 
173
- {{ software | default("[More Information Needed]", true)}}
174
-
175
- ## Citation [optional]
176
-
177
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
178
-
179
- **BibTeX:**
180
-
181
- {{ citation_bibtex | default("[More Information Needed]", true)}}
182
-
183
- **APA:**
184
-
185
- {{ citation_apa | default("[More Information Needed]", true)}}
186
-
187
- ## Glossary [optional]
188
-
189
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
190
-
191
- {{ glossary | default("[More Information Needed]", true)}}
192
-
193
- ## More Information [optional]
194
-
195
- {{ more_information | default("[More Information Needed]", true)}}
196
 
197
  ## Model Card Authors [optional]
198
 
199
- {{ model_card_authors | default("[More Information Needed]", true)}}
200
 
201
  ## Model Card Contact
202
 
203
- {{ model_card_contact | default("[More Information Needed]", true)}}
 
7
  library_name: adapter-transformers
8
  ---
9
 
10
+ # Model Card for K23 MiniMed
11
 
12
+ This is a Mistral 7b Beta Medical Fine Tune with a short number of steps , inspired by [Wonhyeong Seo](https://www.huggingface.co/wseo) great mentorship during Krew x Huggingface 2023 hackathon.
 
 
13
 
14
  ## Model Details
15
 
16
  ### Model Description
17
 
18
+ - **Developed by:** [Tonic](https://huggingface.co/Tonic)
19
+ - **Funded by [optional]:** [Tonic](https://huggingface.co/Tonic)
20
+ - **Shared by [optional]:** K23-Krew-Hackathon
21
+ - **Model type:** Mistral 7B-Beta Medical Fine Tune
22
+ - **Language(s) (NLP):** English
23
+ - **License:** MIT
24
+ - **Finetuned from model [optional]:** [Zephyr 7B-Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
 
 
 
 
25
 
26
  ### Model Sources [optional]
27
 
28
+ - **Repository:** [github](https://github.com/Josephrp/AI-challenge-hackathon/blob/master/mistral7b-beta_finetune.ipynb)
 
 
 
29
  - **Demo [optional]:** {{ demo | default("[More Information Needed]", true)}}
30
 
31
  ## Uses
32
 
33
+ Use this model for conversational applications for medical question and answering **for educational purposes only** !
34
 
35
  ### Direct Use
36
 
37
+ Make a gradio chatbot app to ask medical questions and get answers conversationaly.
 
 
38
 
39
  ### Downstream Use [optional]
40
 
41
+ This model is **for educational use only** .
 
 
42
 
43
+ Further fine tunes and uses would include :
44
 
45
+ - public health & sanitation
46
+ - personal health & sanitation
47
+ - medical Q & A
 
 
 
 
 
 
48
 
49
  ### Recommendations
50
 
51
+ - always evaluate this model before use
52
+ - always benchmark this model before use
53
+ - always evaluate bias before use
54
+ - do not use as is, fine tune further
55
 
 
56
 
57
  ## How to Get Started with the Model
58
 
 
62
 
63
  ## Training Details
64
 
65
+ | Step | Training Loss |
66
+ |------|--------------|
67
+ | 50 | 0.993800 |
68
+ | 100 | 0.620600 |
69
+ | 150 | 0.547100 |
70
+ | 200 | 0.524100 |
71
+ | 250 | 0.520500 |
72
+ | 300 | 0.559800 |
73
+ | 350 | 0.535500 |
74
+ | 400 | 0.505400 |
 
 
 
 
 
 
 
 
 
 
 
 
75
 
76
+ ### Training Data
77
 
 
78
 
79
+ ```json
80
 
81
+ {trainable params: 21260288 || all params: 3773331456 || trainable%: 0.5634354746703705}
82
 
83
+ ```
84
 
85
+ ### Training Procedure
86
 
 
87
 
 
88
 
89
+ #### Preprocessing [optional]
90
 
91
+ Lora32bits
92
 
 
93
 
94
+ #### Speeds, Sizes, Times [optional]
95
 
96
+ ```json
97
+ metrics={'train_runtime': 1700.1608, 'train_samples_per_second': 1.882, 'train_steps_per_second': 0.235, 'total_flos': 9.585300996096e+16, 'train_loss': 0.6008514881134033, 'epoch': 0.2})
98
+ ```
99
 
100
  ### Results
101
 
102
+ ```json
103
+ TrainOutput
104
 
105
+ global_step=400, training_loss=0.6008514881134033
106
+ ```
 
 
 
107
 
108
+ #### Summary
 
 
109
 
110
  ## Environmental Impact
111
 
 
119
  - **Compute Region:** {{ cloud_region | default("[More Information Needed]", true)}}
120
  - **Carbon Emitted:** {{ co2_emitted | default("[More Information Needed]", true)}}
121
 
122
+ ## Technical Specifications
123
 
124
  ### Model Architecture and Objective
125
 
126
+ ```python
127
+
128
+ PeftModelForCausalLM(
129
+ (base_model): LoraModel(
130
+ (model): MistralForCausalLM(
131
+ (model): MistralModel(
132
+ (embed_tokens): Embedding(32000, 4096)
133
+ (layers): ModuleList(
134
+ (0-31): 32 x MistralDecoderLayer(
135
+ (self_attn): MistralAttention(
136
+ (q_proj): Linear4bit(
137
+ (lora_dropout): ModuleDict(
138
+ (default): Dropout(p=0.05, inplace=False)
139
+ )
140
+ (lora_A): ModuleDict(
141
+ (default): Linear(in_features=4096, out_features=8, bias=False)
142
+ )
143
+ (lora_B): ModuleDict(
144
+ (default): Linear(in_features=8, out_features=4096, bias=False)
145
+ )
146
+ (lora_embedding_A): ParameterDict()
147
+ (lora_embedding_B): ParameterDict()
148
+ (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
149
+ )
150
+ (k_proj): Linear4bit(
151
+ (lora_dropout): ModuleDict(
152
+ (default): Dropout(p=0.05, inplace=False)
153
+ )
154
+ (lora_A): ModuleDict(
155
+ (default): Linear(in_features=4096, out_features=8, bias=False)
156
+ )
157
+ (lora_B): ModuleDict(
158
+ (default): Linear(in_features=8, out_features=1024, bias=False)
159
+ )
160
+ (lora_embedding_A): ParameterDict()
161
+ (lora_embedding_B): ParameterDict()
162
+ (base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
163
+ )
164
+ (v_proj): Linear4bit(
165
+ (lora_dropout): ModuleDict(
166
+ (default): Dropout(p=0.05, inplace=False)
167
+ )
168
+ (lora_A): ModuleDict(
169
+ (default): Linear(in_features=4096, out_features=8, bias=False)
170
+ )
171
+ (lora_B): ModuleDict(
172
+ (default): Linear(in_features=8, out_features=1024, bias=False)
173
+ )
174
+ (lora_embedding_A): ParameterDict()
175
+ (lora_embedding_B): ParameterDict()
176
+ (base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
177
+ )
178
+ (o_proj): Linear4bit(
179
+ (lora_dropout): ModuleDict(
180
+ (default): Dropout(p=0.05, inplace=False)
181
+ )
182
+ (lora_A): ModuleDict(
183
+ (default): Linear(in_features=4096, out_features=8, bias=False)
184
+ )
185
+ (lora_B): ModuleDict(
186
+ (default): Linear(in_features=8, out_features=4096, bias=False)
187
+ )
188
+ (lora_embedding_A): ParameterDict()
189
+ (lora_embedding_B): ParameterDict()
190
+ (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
191
+ )
192
+ (rotary_emb): MistralRotaryEmbedding()
193
+ )
194
+ (mlp): MistralMLP(
195
+ (gate_proj): Linear4bit(
196
+ (lora_dropout): ModuleDict(
197
+ (default): Dropout(p=0.05, inplace=False)
198
+ )
199
+ (lora_A): ModuleDict(
200
+ (default): Linear(in_features=4096, out_features=8, bias=False)
201
+ )
202
+ (lora_B): ModuleDict(
203
+ (default): Linear(in_features=8, out_features=14336, bias=False)
204
+ )
205
+ (lora_embedding_A): ParameterDict()
206
+ (lora_embedding_B): ParameterDict()
207
+ (base_layer): Linear4bit(in_features=4096, out_features=14336, bias=False)
208
+ )
209
+ (up_proj): Linear4bit(
210
+ (lora_dropout): ModuleDict(
211
+ (default): Dropout(p=0.05, inplace=False)
212
+ )
213
+ (lora_A): ModuleDict(
214
+ (default): Linear(in_features=4096, out_features=8, bias=False)
215
+ )
216
+ (lora_B): ModuleDict(
217
+ (default): Linear(in_features=8, out_features=14336, bias=False)
218
+ )
219
+ (lora_embedding_A): ParameterDict()
220
+ (lora_embedding_B): ParameterDict()
221
+ (base_layer): Linear4bit(in_features=4096, out_features=14336, bias=False)
222
+ )
223
+ (down_proj): Linear4bit(
224
+ (lora_dropout): ModuleDict(
225
+ (default): Dropout(p=0.05, inplace=False)
226
+ )
227
+ (lora_A): ModuleDict(
228
+ (default): Linear(in_features=14336, out_features=8, bias=False)
229
+ )
230
+ (lora_B): ModuleDict(
231
+ (default): Linear(in_features=8, out_features=4096, bias=False)
232
+ )
233
+ (lora_embedding_A): ParameterDict()
234
+ (lora_embedding_B): ParameterDict()
235
+ (base_layer): Linear4bit(in_features=14336, out_features=4096, bias=False)
236
+ )
237
+ (act_fn): SiLUActivation()
238
+ )
239
+ (input_layernorm): MistralRMSNorm()
240
+ (post_attention_layernorm): MistralRMSNorm()
241
+ )
242
+ )
243
+ (norm): MistralRMSNorm()
244
+ )
245
+ (lm_head): Linear(
246
+ in_features=4096, out_features=32000, bias=False
247
+ (lora_dropout): ModuleDict(
248
+ (default): Dropout(p=0.05, inplace=False)
249
+ )
250
+ (lora_A): ModuleDict(
251
+ (default): Linear(in_features=4096, out_features=8, bias=False)
252
+ )
253
+ (lora_B): ModuleDict(
254
+ (default): Linear(in_features=8, out_features=32000, bias=False)
255
+ )
256
+ (lora_embedding_A): ParameterDict()
257
+ (lora_embedding_B): ParameterDict()
258
+ )
259
+ )
260
+ )
261
+ )
262
+
263
+ ```
264
 
265
  ### Compute Infrastructure
266
 
 
 
267
  #### Hardware
268
 
269
+ A100
270
 
271
  #### Software
272
 
273
+ peft , torch, bitsandbytes, python, huggingface
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
274
 
275
  ## Model Card Authors [optional]
276
 
277
+ [Tonic](https://huggingface.co/Tonic)
278
 
279
  ## Model Card Contact
280
 
281
+ [Tonic](https://huggingface.co/Tonic)