jtatman commited on
Commit
08cde5d
1 Parent(s): b9e5e8e

End of training

Browse files
README.md CHANGED
@@ -4,6 +4,7 @@ library_name: peft
4
  license: apache-2.0
5
  tags:
6
  - axolotl
 
7
  - generated_from_trainer
8
  model-index:
9
  - name: pythia-160m-storytelling
@@ -19,22 +20,37 @@ should probably proofread and complete it, then remove this comment. -->
19
  axolotl version: `0.4.1`
20
  ```yaml
21
  base_model: EleutherAI/pythia-160m-deduped
22
- load_in_8bit: false
23
  datasets:
24
  - path: jtatman/storywriting_combined_instruct
25
  type: alpaca
26
  dataset_prepared_path: ds-storytelling
27
- val_set_size: 0.05
 
28
  adapter: lora
29
  lora_model_dir:
30
  sequence_len: 2048
31
  lora_r: 16
32
- lora_alpha: 64
33
  lora_dropout: 0.05
34
  lora_target_modules:
35
  - query_key_value
36
- lora_target_linear:
37
  lora_fan_in_fan_out: true # pythia/GPTNeoX lora specific
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  wandb_project: pythia
39
  wandb_entity:
40
  wandb_watch:
@@ -43,7 +59,7 @@ wandb_log_model:
43
  output_dir: ./outputs/lora-alpaca-pythia-160m-storytelling
44
  gradient_accumulation_steps: 16
45
  micro_batch_size: 1
46
- num_epochs: 5
47
  learning_rate: 0.0006
48
  lr_scheduler: cosine_with_restarts
49
  #cosine_min_lr_ratio: 0.1
@@ -58,19 +74,17 @@ xformers_attention: true
58
  optimizer: paged_adamw_8bit
59
  gpu_memory_limit: 8GiB
60
  hub_model_id: jtatman/pythia-160m-storytelling
61
- lora_on_cpu: true
62
- early_stopping_patience: 3
63
  #resume_from_checkpoint: outputs/lora-alpaca-pythia-125m/checkpoint-51040
64
  auto_resume_from_checkpoints: true
65
  local_rank:
66
- weight_decay: 0.1
67
- chat_template: inst
68
  #evals_per_epoch: 4
69
- eval_steps: 2000
70
  logging_steps: 1
71
- save_steps: 2000
72
  save_total_limit: 5
73
- warmup_steps: 1000
74
 
75
  ```
76
 
@@ -80,7 +94,7 @@ warmup_steps: 1000
80
 
81
  This model is a fine-tuned version of [EleutherAI/pythia-160m-deduped](https://huggingface.co/EleutherAI/pythia-160m-deduped) on the None dataset.
82
  It achieves the following results on the evaluation set:
83
- - Loss: 10.3843
84
 
85
  ## Model description
86
 
@@ -107,16 +121,18 @@ The following hyperparameters were used during training:
107
  - total_train_batch_size: 16
108
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
109
  - lr_scheduler_type: cosine_with_restarts
110
- - lr_scheduler_warmup_steps: 1000
111
- - num_epochs: 5
112
 
113
  ### Training results
114
 
115
  | Training Loss | Epoch | Step | Validation Loss |
116
  |:-------------:|:------:|:----:|:---------------:|
117
- | 5.4891 | 0.0012 | 1 | 4.5640 |
118
- | 8.4799 | 2.4467 | 2000 | 9.1436 |
119
- | 9.9198 | 4.8944 | 4000 | 10.3843 |
 
 
120
 
121
 
122
  ### Framework versions
 
4
  license: apache-2.0
5
  tags:
6
  - axolotl
7
+ - relora
8
  - generated_from_trainer
9
  model-index:
10
  - name: pythia-160m-storytelling
 
20
  axolotl version: `0.4.1`
21
  ```yaml
22
  base_model: EleutherAI/pythia-160m-deduped
23
+ load_in_8bit:
24
  datasets:
25
  - path: jtatman/storywriting_combined_instruct
26
  type: alpaca
27
  dataset_prepared_path: ds-storytelling
28
+ chat_template: inst
29
+ val_set_size: 0.01
30
  adapter: lora
31
  lora_model_dir:
32
  sequence_len: 2048
33
  lora_r: 16
34
+ lora_alpha: 32
35
  lora_dropout: 0.05
36
  lora_target_modules:
37
  - query_key_value
38
+ lora_target_linear: true
39
  lora_fan_in_fan_out: true # pythia/GPTNeoX lora specific
40
+ lora_modules_to_save:
41
+ - embed_tokens
42
+ - lm_head
43
+ lora_on_cpu: false
44
+ # ReLoRA configuration
45
+ # # Must use either 'lora' or 'qlora' adapter, and does not support fsdp or deepspeed
46
+ # relora_steps: # Number of steps per ReLoRA restart
47
+ # relora_warmup_steps: # Number of per-restart warmup steps
48
+ # relora_anneal_steps: # Number of anneal steps for each relora cycle
49
+ # relora_prune_ratio: # threshold for optimizer magnitude when pruning
50
+ # relora_cpu_offload: # True to perform lora weight merges on cpu during restarts, for modest gpu memory savings
51
+ relora_steps: 200
52
+ relora_warmup_steps: 10
53
+ relora_cpu_offload: false
54
  wandb_project: pythia
55
  wandb_entity:
56
  wandb_watch:
 
59
  output_dir: ./outputs/lora-alpaca-pythia-160m-storytelling
60
  gradient_accumulation_steps: 16
61
  micro_batch_size: 1
62
+ num_epochs: 1
63
  learning_rate: 0.0006
64
  lr_scheduler: cosine_with_restarts
65
  #cosine_min_lr_ratio: 0.1
 
74
  optimizer: paged_adamw_8bit
75
  gpu_memory_limit: 8GiB
76
  hub_model_id: jtatman/pythia-160m-storytelling
77
+ early_stopping_patience: 2
 
78
  #resume_from_checkpoint: outputs/lora-alpaca-pythia-125m/checkpoint-51040
79
  auto_resume_from_checkpoints: true
80
  local_rank:
81
+ weight_decay: 0.0
 
82
  #evals_per_epoch: 4
83
+ eval_steps: 200
84
  logging_steps: 1
85
+ save_steps: 200
86
  save_total_limit: 5
87
+ warmup_steps: 100
88
 
89
  ```
90
 
 
94
 
95
  This model is a fine-tuned version of [EleutherAI/pythia-160m-deduped](https://huggingface.co/EleutherAI/pythia-160m-deduped) on the None dataset.
96
  It achieves the following results on the evaluation set:
97
+ - Loss: 2.8975
98
 
99
  ## Model description
100
 
 
121
  - total_train_batch_size: 16
122
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
123
  - lr_scheduler_type: cosine_with_restarts
124
+ - lr_scheduler_warmup_steps: 100
125
+ - num_epochs: 1
126
 
127
  ### Training results
128
 
129
  | Training Loss | Epoch | Step | Validation Loss |
130
  |:-------------:|:------:|:----:|:---------------:|
131
+ | 5.5185 | 0.0012 | 1 | 4.8333 |
132
+ | 3.7004 | 0.2348 | 200 | 3.2693 |
133
+ | 3.52 | 0.4696 | 400 | 3.3535 |
134
+ | 3.7836 | 0.7043 | 600 | 2.9896 |
135
+ | 3.3058 | 0.9391 | 800 | 2.8975 |
136
 
137
 
138
  ### Framework versions
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2d341e5db83d74ccdb7d452b50b4cd56947a73515c321d0efc022725975d5ad9
3
- size 4731832
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e44ce263e6fd885f50d82ca515b9325375b43ee36ededb75acf161ce88bc2e41
3
+ size 48
config.json CHANGED
@@ -22,7 +22,7 @@
22
  "rotary_emb_base": 10000,
23
  "rotary_pct": 0.25,
24
  "tie_word_embeddings": false,
25
- "torch_dtype": "float16",
26
  "transformers_version": "4.41.2",
27
  "use_cache": false,
28
  "use_parallel_residual": true,
 
22
  "rotary_emb_base": 10000,
23
  "rotary_pct": 0.25,
24
  "tie_word_embeddings": false,
25
+ "torch_dtype": "bfloat16",
26
  "transformers_version": "4.41.2",
27
  "use_cache": false,
28
  "use_parallel_residual": true,
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "do_sample": true,
5
+ "eos_token_id": 0,
6
+ "transformers_version": "4.41.2"
7
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:727fc3779b3fe223ff519f231c0c6c4a9f9092e972590479a1a23e8a5cb4c7db
3
+ size 324696090