PEFT
Safetensors
llama
Generated from Trainer
muellerzr HF staff commited on
Commit
6d1f6b2
·
verified ·
1 Parent(s): 96f90aa

Upload axolotl_config.yml with huggingface_hub

Browse files
Files changed (1) hide show
  1. axolotl_config.yml +107 -2
axolotl_config.yml CHANGED
@@ -1,3 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  base_model: llama3-8B
2
  model_type: LlamaForCausalLM
3
  tokenizer_type: AutoTokenizer
@@ -53,7 +71,7 @@ group_by_length: false
53
  bf16: auto
54
  fp16:
55
  tf32: false
56
- chat_template: chatml
57
 
58
  gradient_checkpointing: true
59
  gradient_checkpointing_kwargs:
@@ -94,4 +112,91 @@ tokens:
94
  - "<|im_end|>"
95
  lora_modules_to_save:
96
  - embed_tokens
97
- - lm_head
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ tags:
4
+ - generated_from_trainer
5
+ base_model: llama3-8B
6
+ model-index:
7
+ - name: qlora_decrease_lr_promptfix
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
15
+ <details><summary>See axolotl config</summary>
16
+
17
+ axolotl version: `0.4.0`
18
+ ```yaml
19
  base_model: llama3-8B
20
  model_type: LlamaForCausalLM
21
  tokenizer_type: AutoTokenizer
 
71
  bf16: auto
72
  fp16:
73
  tf32: false
74
+ chat_template: alpaca
75
 
76
  gradient_checkpointing: true
77
  gradient_checkpointing_kwargs:
 
112
  - "<|im_end|>"
113
  lora_modules_to_save:
114
  - embed_tokens
115
+ - lm_head
116
+ ```
117
+
118
+ </details><br>
119
+
120
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/muellerzr/llama-3-8b-self-align-axolotl/runs/2q8jhm3e)
121
+ # qlora_decrease_lr_promptfix
122
+
123
+ This model was trained from scratch on the None dataset.
124
+ It achieves the following results on the evaluation set:
125
+ - Loss: 0.4121
126
+
127
+ ## Model description
128
+
129
+ More information needed
130
+
131
+ ## Intended uses & limitations
132
+
133
+ More information needed
134
+
135
+ ## Training and evaluation data
136
+
137
+ More information needed
138
+
139
+ ## Training procedure
140
+
141
+ ### Training hyperparameters
142
+
143
+ The following hyperparameters were used during training:
144
+ - learning_rate: 2e-05
145
+ - train_batch_size: 2
146
+ - eval_batch_size: 2
147
+ - seed: 42
148
+ - distributed_type: multi-GPU
149
+ - num_devices: 2
150
+ - gradient_accumulation_steps: 8
151
+ - total_train_batch_size: 32
152
+ - total_eval_batch_size: 4
153
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
154
+ - lr_scheduler_type: cosine
155
+ - lr_scheduler_warmup_steps: 100
156
+ - num_epochs: 4
157
+
158
+ ### Training results
159
+
160
+ | Training Loss | Epoch | Step | Validation Loss |
161
+ |:-------------:|:------:|:----:|:---------------:|
162
+ | 0.6903 | 0.0061 | 1 | 0.6706 |
163
+ | 0.6463 | 0.1285 | 21 | 0.6392 |
164
+ | 0.4944 | 0.2571 | 42 | 0.4806 |
165
+ | 0.4495 | 0.3856 | 63 | 0.4532 |
166
+ | 0.4444 | 0.5142 | 84 | 0.4406 |
167
+ | 0.4185 | 0.6427 | 105 | 0.4334 |
168
+ | 0.4336 | 0.7712 | 126 | 0.4286 |
169
+ | 0.4061 | 0.8998 | 147 | 0.4252 |
170
+ | 0.4002 | 1.0145 | 168 | 0.4221 |
171
+ | 0.4013 | 1.1431 | 189 | 0.4205 |
172
+ | 0.3674 | 1.2716 | 210 | 0.4189 |
173
+ | 0.3942 | 1.4002 | 231 | 0.4175 |
174
+ | 0.3984 | 1.5287 | 252 | 0.4165 |
175
+ | 0.3867 | 1.6572 | 273 | 0.4150 |
176
+ | 0.3872 | 1.7858 | 294 | 0.4137 |
177
+ | 0.401 | 1.9143 | 315 | 0.4130 |
178
+ | 0.3602 | 2.0275 | 336 | 0.4126 |
179
+ | 0.3817 | 2.1561 | 357 | 0.4131 |
180
+ | 0.3592 | 2.2846 | 378 | 0.4129 |
181
+ | 0.3729 | 2.4132 | 399 | 0.4127 |
182
+ | 0.372 | 2.5417 | 420 | 0.4121 |
183
+ | 0.3685 | 2.6702 | 441 | 0.4120 |
184
+ | 0.3732 | 2.7988 | 462 | 0.4115 |
185
+ | 0.38 | 2.9273 | 483 | 0.4112 |
186
+ | 0.3637 | 3.0413 | 504 | 0.4114 |
187
+ | 0.3628 | 3.1699 | 525 | 0.4118 |
188
+ | 0.355 | 3.2984 | 546 | 0.4122 |
189
+ | 0.3646 | 3.4269 | 567 | 0.4121 |
190
+ | 0.3496 | 3.5555 | 588 | 0.4121 |
191
+ | 0.3573 | 3.6840 | 609 | 0.4121 |
192
+ | 0.3598 | 3.8125 | 630 | 0.4121 |
193
+ | 0.3669 | 3.9411 | 651 | 0.4121 |
194
+
195
+
196
+ ### Framework versions
197
+
198
+ - PEFT 0.11.1
199
+ - Transformers 4.42.0.dev0
200
+ - Pytorch 2.3.0+cu118
201
+ - Datasets 2.19.1
202
+ - Tokenizers 0.19.1