BlouseJury commited on
Commit
6df6a99
·
verified ·
1 Parent(s): f5f0c37

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +512 -0
README.md ADDED
@@ -0,0 +1,512 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3
3
+ base_model: meta-llama/Meta-Llama-3-70B
4
+ tags:
5
+ - generated_from_trainer
6
+ - axolotl
7
+ model-index:
8
+ - name: out
9
+ results: []
10
+ datasets:
11
+ - cognitivecomputations/Dolphin-2.9
12
+ - teknium/OpenHermes-2.5
13
+ - m-a-p/CodeFeedback-Filtered-Instruction
14
+ - cognitivecomputations/dolphin-coder
15
+ - cognitivecomputations/samantha-data
16
+ - microsoft/orca-math-word-problems-200k
17
+ - Locutusque/function-calling-chatml
18
+ - internlm/Agent-FLAN
19
+ ---
20
+
21
+ # Dolphin 2.9.1 Llama 3 70b 🐬
22
+
23
+ Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations
24
+
25
+ [![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/cognitivecomputations)
26
+ Discord: https://discord.gg/cognitivecomputations
27
+
28
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/ldkN1J0WIDQwU4vutGYiD.png" width="600" />
29
+
30
+ We have retrained our LLama-3-70b fine tune to address behavioral issues in the initial 2.9 dataset. Specifically, Systemchat was causing the model to be *too* reliant on the system prompt. Additionally, it had an occasional quirk that would cause the model to overly reference the system prompt. We also found generation length was at times not sufficient for any given task. We identified the culprit as Ultrachat. Accounting for these concerns, we removed systemchat and ultrachat from the dataset. It is otherwise identical to dolphin-2.9.
31
+
32
+ Our appreciation for the sponsors of Dolphin 2.9.1:
33
+ - [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xL40S node
34
+ - [OnDemand](https://on-demand.io/) - provided inference sponsorship
35
+
36
+ This model is based on Llama-3-70b, and is governed by [META LLAMA 3 COMMUNITY LICENSE AGREEMENT](LICENSE)
37
+
38
+ The base model has 8k context, and the full-weight fine-tuning was with 4k sequence length.
39
+
40
+ It took 3 days on an 8x H100 provided by Crusoe Cloud
41
+
42
+ This model was trained FFT on parameters selected by [Laser Scanner](https://github.com/cognitivecomputations/laserRMT/blob/main/laser_scanner.py), using ChatML prompt template format.
43
+
44
+ example:
45
+
46
+ ```
47
+ <|im_start|>system
48
+ You are Dolphin, a helpful AI assistant.<|im_end|>
49
+ <|im_start|>user
50
+ {prompt}<|im_end|>
51
+ <|im_start|>assistant
52
+
53
+ ```
54
+
55
+ Dolphin-2.9.1 has a variety of instruction, conversational, and coding skills. It also has initial agentic abilities and supports function calling.
56
+
57
+ Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
58
+
59
+ Dolphin is licensed according to Meta's Llama license. We grant permission for any use, including commercial, that falls within accordance with Meta's Llama-3 license. Dolphin was trained on data generated from GPT4, among other models.
60
+
61
+ ## Evals
62
+
63
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/NnLaOrgAud-D_L2QEOHz4.png)
64
+
65
+ ## Training
66
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
67
+ <details><summary>See axolotl config</summary>
68
+
69
+ axolotl version: `0.4.0`
70
+ ```yaml
71
+ base_model: meta-llama/Meta-Llama-3-70B
72
+ model_type: LlamaForCausalLM
73
+ tokenizer_type: AutoTokenizer
74
+
75
+ load_in_8bit: false
76
+ # load_in_4bit: true
77
+ strict: false
78
+
79
+ datasets:
80
+ - path: /workspace/datasets/dolphin-2.9/dolphin201-sharegpt2.jsonl
81
+ type: sharegpt
82
+ conversation: chatml
83
+ - path: /workspace/datasets/dolphin-2.9/dolphin-coder-translate-sharegpt2.jsonl
84
+ type: sharegpt
85
+ conversation: chatml
86
+ - path: /workspace/datasets/dolphin-2.9/dolphin-coder-codegen-sharegpt2.jsonl
87
+ type: sharegpt
88
+ conversation: chatml
89
+ - path: /workspace/datasets/dolphin-2.9/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
90
+ type: sharegpt
91
+ conversation: chatml
92
+ - path: /workspace/datasets/dolphin-2.9/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
93
+ type: sharegpt
94
+ conversation: chatml
95
+ - path: /workspace/datasets/dolphin-2.9/not_samantha_norefusals.jsonl
96
+ type: sharegpt
97
+ conversation: chatml
98
+ - path: /workspace/datasets/dolphin-2.9/Orca-Math-resort-unfiltered.jsonl
99
+ type: sharegpt
100
+ conversation: chatml
101
+ - path: /workspace/datasets/dolphin-2.9/agent_instruct_react_unfiltered.jsonl
102
+ type: sharegpt
103
+ conversation: chatml
104
+ - path: /workspace/datasets/dolphin-2.9/toolbench_instruct_j1s1_3k_unfiltered.jsonl
105
+ type: sharegpt
106
+ conversation: chatml
107
+ - path: /workspace/datasets/dolphin-2.9/toolbench_negative_unfiltered.jsonl
108
+ type: sharegpt
109
+ conversation: chatml
110
+ - path: /workspace/datasets/dolphin-2.9/toolbench_react_10p_unfiltered.jsonl
111
+ type: sharegpt
112
+ conversation: chatml
113
+ - path: /workspace/datasets/dolphin-2.9/toolbench_tflan_cot_30p_unfiltered.jsonl
114
+ type: sharegpt
115
+ conversation: chatml
116
+ - path: /workspace/datasets/dolphin-2.9/openhermes200k_unfiltered.jsonl
117
+ type: sharegpt
118
+ conversation: chatml
119
+
120
+ chat_template: chatml
121
+ # adapter: qlora
122
+ # lora_r: 128
123
+ # lora_alpha: 16
124
+ # lora_modules_to_save: [embed_tokens, lm_head]
125
+ # lora_dropout: 0.05
126
+ # lora_target_linear: true
127
+
128
+
129
+
130
+ unfrozen_parameters:
131
+ - ^lm_head.weight$
132
+ - ^model.embed_tokens.weight$
133
+ # mlp.down_proj layers
134
+ - model.layers.40.mlp.down_proj
135
+ - model.layers.44.mlp.down_proj
136
+ - model.layers.45.mlp.down_proj
137
+ - model.layers.46.mlp.down_proj
138
+ - model.layers.43.mlp.down_proj
139
+ - model.layers.52.mlp.down_proj
140
+ - model.layers.47.mlp.down_proj
141
+ - model.layers.48.mlp.down_proj
142
+ - model.layers.39.mlp.down_proj
143
+ - model.layers.49.mlp.down_proj
144
+ - model.layers.38.mlp.down_proj
145
+ - model.layers.53.mlp.down_proj
146
+ - model.layers.41.mlp.down_proj
147
+ - model.layers.35.mlp.down_proj
148
+ - model.layers.51.mlp.down_proj
149
+ - model.layers.42.mlp.down_proj
150
+ - model.layers.37.mlp.down_proj
151
+ - model.layers.50.mlp.down_proj
152
+ - model.layers.60.mlp.down_proj
153
+ - model.layers.76.mlp.down_proj
154
+ - model.layers.54.mlp.down_proj
155
+ - model.layers.36.mlp.down_proj
156
+ - model.layers.57.mlp.down_proj
157
+ - model.layers.56.mlp.down_proj
158
+ - model.layers.55.mlp.down_proj
159
+ - model.layers.77.mlp.down_proj
160
+ - model.layers.59.mlp.down_proj
161
+ - model.layers.61.mlp.down_proj
162
+ - model.layers.58.mlp.down_proj
163
+ - model.layers.65.mlp.down_proj
164
+ - model.layers.75.mlp.down_proj
165
+ - model.layers.64.mlp.down_proj
166
+ - model.layers.62.mlp.down_proj
167
+ - model.layers.68.mlp.down_proj
168
+ - model.layers.19.mlp.down_proj
169
+ - model.layers.66.mlp.down_proj
170
+ # mlp.gate_proj layers
171
+ - model.layers.70.mlp.gate_proj
172
+ - model.layers.71.mlp.gate_proj
173
+ - model.layers.67.mlp.gate_proj
174
+ - model.layers.58.mlp.gate_proj
175
+ - model.layers.55.mlp.gate_proj
176
+ - model.layers.57.mlp.gate_proj
177
+ - model.layers.56.mlp.gate_proj
178
+ - model.layers.66.mlp.gate_proj
179
+ - model.layers.72.mlp.gate_proj
180
+ - model.layers.69.mlp.gate_proj
181
+ - model.layers.52.mlp.gate_proj
182
+ - model.layers.54.mlp.gate_proj
183
+ - model.layers.62.mlp.gate_proj
184
+ - model.layers.60.mlp.gate_proj
185
+ - model.layers.74.mlp.gate_proj
186
+ - model.layers.59.mlp.gate_proj
187
+ - model.layers.68.mlp.gate_proj
188
+ - model.layers.61.mlp.gate_proj
189
+ - model.layers.73.mlp.gate_proj
190
+ - model.layers.53.mlp.gate_proj
191
+ - model.layers.51.mlp.gate_proj
192
+ - model.layers.63.mlp.gate_proj
193
+ - model.layers.48.mlp.gate_proj
194
+ - model.layers.49.mlp.gate_proj
195
+ - model.layers.64.mlp.gate_proj
196
+ - model.layers.50.mlp.gate_proj
197
+ - model.layers.65.mlp.gate_proj
198
+ - model.layers.47.mlp.gate_proj
199
+ - model.layers.44.mlp.gate_proj
200
+ - model.layers.45.mlp.gate_proj
201
+ - model.layers.75.mlp.gate_proj
202
+ - model.layers.46.mlp.gate_proj
203
+ - model.layers.43.mlp.gate_proj
204
+ - model.layers.77.mlp.gate_proj
205
+ - model.layers.41.mlp.gate_proj
206
+ - model.layers.42.mlp.gate_proj
207
+ # mlp.up_proj layers
208
+ - model.layers.70.mlp.up_proj
209
+ - model.layers.67.mlp.up_proj
210
+ - model.layers.66.mlp.up_proj
211
+ - model.layers.69.mlp.up_proj
212
+ - model.layers.62.mlp.up_proj
213
+ - model.layers.65.mlp.up_proj
214
+ - model.layers.63.mlp.up_proj
215
+ - model.layers.68.mlp.up_proj
216
+ - model.layers.71.mlp.up_proj
217
+ - model.layers.64.mlp.up_proj
218
+ - model.layers.61.mlp.up_proj
219
+ - model.layers.58.mlp.up_proj
220
+ - model.layers.59.mlp.up_proj
221
+ - model.layers.57.mlp.up_proj
222
+ - model.layers.55.mlp.up_proj
223
+ - model.layers.72.mlp.up_proj
224
+ - model.layers.54.mlp.up_proj
225
+ - model.layers.60.mlp.up_proj
226
+ - model.layers.56.mlp.up_proj
227
+ - model.layers.73.mlp.up_proj
228
+ - model.layers.50.mlp.up_proj
229
+ - model.layers.51.mlp.up_proj
230
+ - model.layers.53.mlp.up_proj
231
+ - model.layers.74.mlp.up_proj
232
+ - model.layers.52.mlp.up_proj
233
+ - model.layers.49.mlp.up_proj
234
+ - model.layers.30.mlp.up_proj
235
+ - model.layers.34.mlp.up_proj
236
+ - model.layers.47.mlp.up_proj
237
+ - model.layers.46.mlp.up_proj
238
+ - model.layers.48.mlp.up_proj
239
+ - model.layers.38.mlp.up_proj
240
+ - model.layers.45.mlp.up_proj
241
+ - model.layers.43.mlp.up_proj
242
+ - model.layers.29.mlp.up_proj
243
+ - model.layers.42.mlp.up_proj
244
+ # self_attn.k_proj layers
245
+ - model.layers.72.self_attn.k_proj
246
+ - model.layers.75.self_attn.k_proj
247
+ - model.layers.71.self_attn.k_proj
248
+ - model.layers.74.self_attn.k_proj
249
+ - model.layers.44.self_attn.k_proj
250
+ - model.layers.31.self_attn.k_proj
251
+ - model.layers.33.self_attn.k_proj
252
+ - model.layers.34.self_attn.k_proj
253
+ - model.layers.76.self_attn.k_proj
254
+ - model.layers.78.self_attn.k_proj
255
+ - model.layers.77.self_attn.k_proj
256
+ - model.layers.22.self_attn.k_proj
257
+ - model.layers.18.self_attn.k_proj
258
+ - model.layers.60.self_attn.k_proj
259
+ - model.layers.17.self_attn.k_proj
260
+ - model.layers.56.self_attn.k_proj
261
+ - model.layers.2.self_attn.k_proj
262
+ - model.layers.21.self_attn.k_proj
263
+ - model.layers.19.self_attn.k_proj
264
+ - model.layers.23.self_attn.k_proj
265
+ - model.layers.52.self_attn.k_proj
266
+ - model.layers.35.self_attn.k_proj
267
+ - model.layers.73.self_attn.k_proj
268
+ - model.layers.15.self_attn.k_proj
269
+ - model.layers.27.self_attn.k_proj
270
+ - model.layers.29.self_attn.k_proj
271
+ - model.layers.20.self_attn.k_proj
272
+ - model.layers.28.self_attn.k_proj
273
+ - model.layers.36.self_attn.k_proj
274
+ - model.layers.25.self_attn.k_proj
275
+ - model.layers.37.self_attn.k_proj
276
+ - model.layers.30.self_attn.k_proj
277
+ - model.layers.16.self_attn.k_proj
278
+ - model.layers.32.self_attn.k_proj
279
+ - model.layers.41.self_attn.k_proj
280
+ - model.layers.26.self_attn.k_proj
281
+ # self_attn.o_proj layers
282
+ - model.layers.50.self_attn.o_proj
283
+ - model.layers.61.self_attn.o_proj
284
+ - model.layers.46.self_attn.o_proj
285
+ - model.layers.53.self_attn.o_proj
286
+ - model.layers.54.self_attn.o_proj
287
+ - model.layers.19.self_attn.o_proj
288
+ - model.layers.42.self_attn.o_proj
289
+ - model.layers.49.self_attn.o_proj
290
+ - model.layers.41.self_attn.o_proj
291
+ - model.layers.68.self_attn.o_proj
292
+ - model.layers.18.self_attn.o_proj
293
+ - model.layers.45.self_attn.o_proj
294
+ - model.layers.11.self_attn.o_proj
295
+ - model.layers.67.self_attn.o_proj
296
+ - model.layers.48.self_attn.o_proj
297
+ - model.layers.51.self_attn.o_proj
298
+ - model.layers.64.self_attn.o_proj
299
+ - model.layers.13.self_attn.o_proj
300
+ - model.layers.14.self_attn.o_proj
301
+ - model.layers.16.self_attn.o_proj
302
+ - model.layers.17.self_attn.o_proj
303
+ - model.layers.47.self_attn.o_proj
304
+ - model.layers.0.self_attn.o_proj
305
+ - model.layers.20.self_attn.o_proj
306
+ - model.layers.63.self_attn.o_proj
307
+ - model.layers.15.self_attn.o_proj
308
+ - model.layers.5.self_attn.o_proj
309
+ - model.layers.21.self_attn.o_proj
310
+ - model.layers.52.self_attn.o_proj
311
+ - model.layers.12.self_attn.o_proj
312
+ - model.layers.10.self_attn.o_proj
313
+ - model.layers.62.self_attn.o_proj
314
+ - model.layers.56.self_attn.o_proj
315
+ - model.layers.22.self_attn.o_proj
316
+ - model.layers.6.self_attn.o_proj
317
+ - model.layers.7.self_attn.o_proj
318
+ # self_attn.q_proj layers
319
+ - model.layers.2.self_attn.q_proj
320
+ - model.layers.4.self_attn.q_proj
321
+ - model.layers.46.self_attn.q_proj
322
+ - model.layers.5.self_attn.q_proj
323
+ - model.layers.7.self_attn.q_proj
324
+ - model.layers.6.self_attn.q_proj
325
+ - model.layers.9.self_attn.q_proj
326
+ - model.layers.10.self_attn.q_proj
327
+ - model.layers.1.self_attn.q_proj
328
+ - model.layers.18.self_attn.q_proj
329
+ - model.layers.62.self_attn.q_proj
330
+ - model.layers.8.self_attn.q_proj
331
+ - model.layers.15.self_attn.q_proj
332
+ - model.layers.14.self_attn.q_proj
333
+ - model.layers.16.self_attn.q_proj
334
+ - model.layers.31.self_attn.q_proj
335
+ - model.layers.19.self_attn.q_proj
336
+ - model.layers.17.self_attn.q_proj
337
+ - model.layers.33.self_attn.q_proj
338
+ - model.layers.35.self_attn.q_proj
339
+ - model.layers.12.self_attn.q_proj
340
+ - model.layers.21.self_attn.q_proj
341
+ - model.layers.27.self_attn.q_proj
342
+ - model.layers.34.self_attn.q_proj
343
+ - model.layers.13.self_attn.q_proj
344
+ - model.layers.56.self_attn.q_proj
345
+ - model.layers.11.self_attn.q_proj
346
+ - model.layers.52.self_attn.q_proj
347
+ - model.layers.54.self_attn.q_proj
348
+ - model.layers.28.self_attn.q_proj
349
+ - model.layers.30.self_attn.q_proj
350
+ - model.layers.20.self_attn.q_proj
351
+ - model.layers.29.self_attn.q_proj
352
+ - model.layers.37.self_attn.q_proj
353
+ - model.layers.23.self_attn.q_proj
354
+ - model.layers.75.self_attn.q_proj
355
+ # self_attn.v_proj layers
356
+ - model.layers.11.self_attn.v_proj
357
+ - model.layers.17.self_attn.v_proj
358
+ - model.layers.37.self_attn.v_proj
359
+ - model.layers.40.self_attn.v_proj
360
+ - model.layers.41.self_attn.v_proj
361
+ - model.layers.42.self_attn.v_proj
362
+ - model.layers.43.self_attn.v_proj
363
+ - model.layers.44.self_attn.v_proj
364
+ - model.layers.45.self_attn.v_proj
365
+ - model.layers.46.self_attn.v_proj
366
+ - model.layers.48.self_attn.v_proj
367
+ - model.layers.49.self_attn.v_proj
368
+ - model.layers.50.self_attn.v_proj
369
+ - model.layers.51.self_attn.v_proj
370
+ - model.layers.53.self_attn.v_proj
371
+ - model.layers.54.self_attn.v_proj
372
+ - model.layers.55.self_attn.v_proj
373
+ - model.layers.57.self_attn.v_proj
374
+ - model.layers.58.self_attn.v_proj
375
+ - model.layers.59.self_attn.v_proj
376
+ - model.layers.60.self_attn.v_proj
377
+ - model.layers.61.self_attn.v_proj
378
+ - model.layers.62.self_attn.v_proj
379
+ - model.layers.63.self_attn.v_proj
380
+ - model.layers.64.self_attn.v_proj
381
+ - model.layers.65.self_attn.v_proj
382
+ - model.layers.66.self_attn.v_proj
383
+ - model.layers.67.self_attn.v_proj
384
+ - model.layers.69.self_attn.v_proj
385
+ - model.layers.75.self_attn.v_proj
386
+ - model.layers.18.self_attn.v_proj
387
+ - model.layers.78.self_attn.v_proj
388
+ - model.layers.68.self_attn.v_proj
389
+ - model.layers.47.self_attn.v_proj
390
+ - model.layers.38.self_attn.v_proj
391
+ - model.layers.71.self_attn.v_proj
392
+ # model.norm layers
393
+
394
+
395
+
396
+ dataset_prepared_path: last_run_prepared
397
+ val_set_size: 0.01
398
+ output_dir: /workspace/axolotl/llama-70b
399
+
400
+ sequence_len: 4096
401
+ sample_packing: true
402
+ pad_to_sequence_len: true
403
+
404
+ wandb_project: llama-3
405
+ wandb_watch:
406
+ wandb_run_id:
407
+ wandb_log_model:
408
+
409
+ gradient_accumulation_steps: 8
410
+ micro_batch_size: 1
411
+ num_epochs: 3
412
+ optimizer: adamw_8bit
413
+ lr_scheduler: cosine
414
+ learning_rate: 1e-5
415
+
416
+ train_on_inputs: false
417
+ group_by_length: false
418
+ bf16: auto
419
+ fp16:
420
+ tf32: true
421
+
422
+ gradient_checkpointing: true
423
+ gradient_checkpointing_kwargs:
424
+ use_reentrant: false
425
+ early_stopping_patience:
426
+ resume_from_checkpoint:
427
+ logging_steps: 1
428
+ xformers_attention:
429
+ flash_attention: true
430
+
431
+ warmup_steps: 5
432
+ evals_per_epoch: 4
433
+ eval_table_size:
434
+ saves_per_epoch: 4
435
+ save_total_limit: 2
436
+ save_steps:
437
+ debug:
438
+ deepspeed: deepspeed_configs/zero3_bf16.json
439
+ weight_decay: 0.00
440
+ fsdp:
441
+ fsdp_config:
442
+ special_tokens:
443
+ eos_token: "<|im_end|>"
444
+ pad_token: "<|end_of_text|>"
445
+ tokens:
446
+ - "<|im_start|>"
447
+ - "<|im_end|>"
448
+ ```
449
+
450
+ </details><br>
451
+
452
+ # workspace/axolotl/llama-70b
453
+
454
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) on the None dataset.
455
+ It achieves the following results on the evaluation set:
456
+ - Loss: 0.4808
457
+
458
+ ## Model description
459
+
460
+ More information needed
461
+
462
+ ## Intended uses & limitations
463
+
464
+ More information needed
465
+
466
+ ## Training and evaluation data
467
+
468
+ More information needed
469
+
470
+ ## Training procedure
471
+
472
+ ### Training hyperparameters
473
+
474
+ The following hyperparameters were used during training:
475
+ - learning_rate: 1e-05
476
+ - train_batch_size: 1
477
+ - eval_batch_size: 1
478
+ - seed: 42
479
+ - distributed_type: multi-GPU
480
+ - num_devices: 8
481
+ - gradient_accumulation_steps: 8
482
+ - total_train_batch_size: 64
483
+ - total_eval_batch_size: 8
484
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
485
+ - lr_scheduler_type: cosine
486
+ - lr_scheduler_warmup_steps: 5
487
+ - num_epochs: 3
488
+
489
+ ### Training results
490
+
491
+ | Training Loss | Epoch | Step | Validation Loss |
492
+ |:-------------:|:------:|:----:|:---------------:|
493
+ | 0.7659 | 0.0004 | 1 | 0.7454 |
494
+ | 0.5006 | 0.2501 | 587 | 0.4817 |
495
+ | 0.4807 | 0.5002 | 1174 | 0.4698 |
496
+ | 0.4758 | 0.7503 | 1761 | 0.4627 |
497
+ | 0.4969 | 1.0004 | 2348 | 0.4558 |
498
+ | 0.3604 | 1.2346 | 2935 | 0.4635 |
499
+ | 0.3817 | 1.4847 | 3522 | 0.4572 |
500
+ | 0.377 | 1.7348 | 4109 | 0.4533 |
501
+ | 0.3695 | 1.9849 | 4696 | 0.4487 |
502
+ | 0.2676 | 2.2187 | 5283 | 0.4825 |
503
+ | 0.255 | 2.4688 | 5870 | 0.4814 |
504
+ | 0.2851 | 2.7189 | 6457 | 0.4808 |
505
+
506
+
507
+ ### Framework versions
508
+
509
+ - Transformers 4.40.2
510
+ - Pytorch 2.2.2+cu121
511
+ - Datasets 2.19.1
512
+ - Tokenizers 0.19.1