chrisdono commited on
Commit
16a3ff9
1 Parent(s): 1d7506e

initial model

Browse files
README.md ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Setup Notes
2
+
3
+ For this model, a VM with 2 T4 GPUs was used.
4
+
5
+ To get the training to work on the 2 GPUs (utilize both GPUS simultaneously), the following command was used to initiate training.
6
+
7
+ WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path 'b-mc2/sql-create-context' --output_dir './lora-alpaca' --num_epochs 1 --micro_batch_size 16
8
+
9
+ Note 1. Micro batch size was increased from the default 4 to 16. Note that increasing it further is possible based on other training that has been performed. This was a first attempt.
10
+
11
+ Note 2. Output directory was initially lora-alpaca and then contents were moved to new folder when initializing git repository.
12
+
13
+
14
+ ## Log
15
+
16
+ (sqltest) chrisdono@deep-learning-duo-t4-3:~/alpaca-lora$ WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/lla$
17
+ a-7b-hf' --data_path 'b-mc2/sql-create-context' --output_dir './lora-alpaca' --num_epochs 1 --micro_batch_size 16
18
+ WARNING:torch.distributed.run:
19
+ *****************************************
20
+ Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your appli
21
+ cation as needed.
22
+ *****************************************
23
+
24
+
25
+ ===================================BUG REPORT===================================
26
+ Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
27
+ ================================================================================
28
+ ===================================BUG REPORT===================================
29
+ Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
30
+ ================================================================================
31
+ /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path
32
+ s...
33
+ warn(msg)
34
+ CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
35
+ CUDA SETUP: Highest compute capability among GPUs detected: 7.5
36
+ CUDA SETUP: Detected CUDA version 113
37
+ CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
38
+ /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path
39
+ s...
40
+ warn(msg)
41
+ CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
42
+ CUDA SETUP: Highest compute capability among GPUs detected: 7.5
43
+ CUDA SETUP: Detected CUDA version 113
44
+ CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
45
+ Training Alpaca-LoRA model with params:
46
+ base_model: decapoda-research/llama-7b-hf
47
+ data_path: b-mc2/sql-create-context
48
+ output_dir: ./lora-alpaca
49
+ batch_size: 128
50
+ micro_batch_size: 16
51
+ num_epochs: 1
52
+ learning_rate: 0.0003
53
+ cutoff_len: 256
54
+ val_set_size: 2000
55
+ lora_r: 8
56
+ lora_alpha: 16
57
+ lora_dropout: 0.05
58
+ lora_target_modules: ['q_proj', 'v_proj']
59
+ train_on_inputs: True
60
+ add_eos_token: False
61
+ group_by_length: False
62
+ wandb_project:
63
+ wandb_run_name:
64
+ wandb_watch:
65
+ wandb_log_model:
66
+ resume_from_checkpoint: False
67
+ prompt template: alpaca
68
+
69
+ Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:24<00:00, 2.57s/it]
70
+ Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:24<00:00, 2.57s/it]
71
+ The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
72
+ The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
73
+ The class this function is called from is 'LlamaTokenizer'.
74
+ The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
75
+ The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
76
+ The class this function is called from is 'LlamaTokenizer'.
77
+ Found cached dataset json (/home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e
78
+ 233e6e)
79
+ 0%| | 0/1 [00:00<?, ?it/s]
80
+ Found cached dataset json (/home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e
81
+ 233e6e)
82
+ 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 9.30it/s]
83
+ 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 7.83it/s]
84
+ trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
85
+ trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
86
+ Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b
87
+ 2dd7af1cf934bed8e233e6e/cache-5a5ac0bd39fc20e0.arrow and /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb5
88
+ 50cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-782fec259d4b8f6a.arrow
89
+ Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b
90
+ 2dd7af1cf934bed8e233e6e/cache-5a5ac0bd39fc20e0.arrow and /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb5
91
+ 50cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-782fec259d4b8f6a.arrow
92
+ {'loss': 2.7003, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.02}
93
+ {'loss': 2.566, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.03}
94
+ {'loss': 2.2648, 'learning_rate': 8.999999999999999e-05, 'epoch': 0.05}
95
+ {'loss': 1.657, 'learning_rate': 0.00011099999999999999, 'epoch': 0.07}
96
+ {'loss': 1.1599, 'learning_rate': 0.00014099999999999998, 'epoch': 0.08}
97
+ {'loss': 0.9037, 'learning_rate': 0.00017099999999999998, 'epoch': 0.1}
98
+ {'loss': 0.8137, 'learning_rate': 0.000201, 'epoch': 0.12}
99
+ {'loss': 0.7827, 'learning_rate': 0.00023099999999999998, 'epoch': 0.13}
100
+ {'loss': 0.7554, 'learning_rate': 0.000261, 'epoch': 0.15}
101
+ {'loss': 0.7357, 'learning_rate': 0.00029099999999999997, 'epoch': 0.17}
102
+ {'loss': 0.6893, 'learning_rate': 0.0002957831325301205, 'epoch': 0.18}
103
+ {'loss': 0.6606, 'learning_rate': 0.00028975903614457827, 'epoch': 0.2}
104
+ {'loss': 0.6506, 'learning_rate': 0.0002837349397590361, 'epoch': 0.22}
105
+ {'loss': 0.6462, 'learning_rate': 0.00027771084337349395, 'epoch': 0.23} [215/1857]
106
+ {'loss': 0.6315, 'learning_rate': 0.0002716867469879518, 'epoch': 0.25}
107
+ {'loss': 0.6337, 'learning_rate': 0.0002656626506024096, 'epoch': 0.27}
108
+ {'loss': 0.6223, 'learning_rate': 0.00025963855421686746, 'epoch': 0.28}
109
+ {'loss': 0.6136, 'learning_rate': 0.00025361445783132525, 'epoch': 0.3}
110
+ {'loss': 0.6198, 'learning_rate': 0.00024759036144578314, 'epoch': 0.32}
111
+ {'loss': 0.6084, 'learning_rate': 0.00024156626506024095, 'epoch': 0.33}
112
+ {'eval_loss': 0.608456552028656, 'eval_runtime': 123.856, 'eval_samples_per_second': 16.148, 'eval_steps_per_second': 1.009, 'epoch': 0.33}
113
+ {'loss': 0.6021, 'learning_rate': 0.00023554216867469876, 'epoch': 0.35}
114
+ {'loss': 0.5949, 'learning_rate': 0.0002295180722891566, 'epoch': 0.37}
115
+ {'loss': 0.5972, 'learning_rate': 0.00022349397590361444, 'epoch': 0.38}
116
+ {'loss': 0.5922, 'learning_rate': 0.00021746987951807228, 'epoch': 0.4}
117
+ {'loss': 0.5876, 'learning_rate': 0.0002114457831325301, 'epoch': 0.42}
118
+ {'loss': 0.5788, 'learning_rate': 0.00020542168674698793, 'epoch': 0.43}
119
+ {'loss': 0.5894, 'learning_rate': 0.0001993975903614458, 'epoch': 0.45}
120
+ {'loss': 0.5877, 'learning_rate': 0.0001933734939759036, 'epoch': 0.47}
121
+ {'loss': 0.5835, 'learning_rate': 0.00018734939759036142, 'epoch': 0.48}
122
+ {'loss': 0.5791, 'learning_rate': 0.00018132530120481925, 'epoch': 0.5}
123
+ {'loss': 0.5841, 'learning_rate': 0.00017530120481927712, 'epoch': 0.52}
124
+ {'loss': 0.5728, 'learning_rate': 0.00016927710843373493, 'epoch': 0.53}
125
+ {'loss': 0.569, 'learning_rate': 0.00016325301204819274, 'epoch': 0.55}
126
+ {'loss': 0.5709, 'learning_rate': 0.00015722891566265058, 'epoch': 0.57}
127
+ {'loss': 0.5762, 'learning_rate': 0.00015120481927710845, 'epoch': 0.58}
128
+ {'loss': 0.5704, 'learning_rate': 0.00014518072289156626, 'epoch': 0.6}
129
+ {'loss': 0.5661, 'learning_rate': 0.0001391566265060241, 'epoch': 0.62}
130
+ {'loss': 0.5662, 'learning_rate': 0.00013313253012048193, 'epoch': 0.63}
131
+ {'loss': 0.5674, 'learning_rate': 0.00012710843373493975, 'epoch': 0.65}
132
+ {'loss': 0.5635, 'learning_rate': 0.00012108433734939758, 'epoch': 0.67}
133
+ {'eval_loss': 0.568750262260437, 'eval_runtime': 122.9061, 'eval_samples_per_second': 16.273, 'eval_steps_per_second': 1.017, 'epoch': 0.67}
134
+ {'loss': 0.5609, 'learning_rate': 0.00011506024096385541, 'epoch': 0.69}
135
+ {'loss': 0.5724, 'learning_rate': 0.00010903614457831325, 'epoch': 0.7}
136
+ {'loss': 0.5603, 'learning_rate': 0.00010301204819277107, 'epoch': 0.72}
137
+ {'loss': 0.5599, 'learning_rate': 9.698795180722891e-05, 'epoch': 0.74}
138
+ {'loss': 0.5655, 'learning_rate': 9.096385542168674e-05, 'epoch': 0.75}
139
+ {'loss': 0.5578, 'learning_rate': 8.493975903614457e-05, 'epoch': 0.77}
140
+ {'loss': 0.5577, 'learning_rate': 7.89156626506024e-05, 'epoch': 0.79}
141
+ {'loss': 0.5606, 'learning_rate': 7.289156626506024e-05, 'epoch': 0.8}
142
+ {'loss': 0.5496, 'learning_rate': 6.686746987951806e-05, 'epoch': 0.82}
143
+ {'loss': 0.5635, 'learning_rate': 6.08433734939759e-05, 'epoch': 0.84}
144
+ {'loss': 0.5522, 'learning_rate': 5.481927710843373e-05, 'epoch': 0.85}
145
+ {'loss': 0.5572, 'learning_rate': 4.879518072289156e-05, 'epoch': 0.87}
146
+ {'loss': 0.5454, 'learning_rate': 4.2771084337349395e-05, 'epoch': 0.89}
147
+ {'loss': 0.5485, 'learning_rate': 3.6746987951807227e-05, 'epoch': 0.9}
148
+ {'loss': 0.5592, 'learning_rate': 3.072289156626506e-05, 'epoch': 0.92}
149
+ {'loss': 0.5499, 'learning_rate': 2.469879518072289e-05, 'epoch': 0.94}
150
+ {'loss': 0.55, 'learning_rate': 1.867469879518072e-05, 'epoch': 0.95}
151
+ {'loss': 0.5511, 'learning_rate': 1.2650602409638553e-05, 'epoch': 0.97}
152
+ {'loss': 0.5531, 'learning_rate': 6.626506024096385e-06, 'epoch': 0.99}
153
+ 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 598/598 [4:45:30<00:00, 27.59s/it]
154
+ {'train_runtime': 17131.1027, 'train_samples_per_second': 4.47, 'train_steps_per_second': 0.035, 'train_loss': 0.7246327424129116, 'epoch': 1.0}
155
+ 100%|████████████████████████████████████████████████████████��█████████████████████████████████████████████████████████████████████████████████████████████| 598/598 [4:45:30<00:00, 28.65s/it]
156
+
adapter_config.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_model_name_or_path": "decapoda-research/llama-7b-hf",
3
+ "bias": "none",
4
+ "enable_lora": null,
5
+ "fan_in_fan_out": false,
6
+ "inference_mode": true,
7
+ "lora_alpha": 16,
8
+ "lora_dropout": 0.05,
9
+ "merge_weights": false,
10
+ "modules_to_save": null,
11
+ "peft_type": "LORA",
12
+ "r": 8,
13
+ "target_modules": [
14
+ "q_proj",
15
+ "v_proj"
16
+ ],
17
+ "task_type": "CAUSAL_LM"
18
+ }
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e42555cfb90f4ae1dea4f60b1fabeb4b0f835742f7ce5cb5db85a88b9bd56ab1
3
+ size 16822989
checkpoint-200/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30aed971820530f6c2286b67813628eb04406f7a8ff8684a5c487d4703ff526a
3
+ size 33661637
checkpoint-200/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:116aaa0e06aeb94d0275cb05d94d3de6ffa0777157bdb82c747abb8d64ce1e9e
3
+ size 16822989
checkpoint-200/rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e16e9f14f9e54b23b5659b8c0c8de15af3106d2fd7100c84972de9f0edddff5a
3
+ size 14583
checkpoint-200/rng_state_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:771e4f2aee11e3618b4bdba3329b0dd8fffdec0bc052288b1bac58fc0f1db62b
3
+ size 14583
checkpoint-200/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:915e54e5670dd4bd7730bf2512f0efeebe5737f3917cb2161e14f0fc5045e265
3
+ size 557
checkpoint-200/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c3951795851fd7473327989d296caf1d37edcfc8f96fd0f7135134b43e8e969
3
+ size 627
checkpoint-200/trainer_state.json ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.608456552028656,
3
+ "best_model_checkpoint": "./lora-alpaca/checkpoint-200",
4
+ "epoch": 0.3341687552213868,
5
+ "global_step": 200,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.02,
12
+ "learning_rate": 2.9999999999999997e-05,
13
+ "loss": 2.7003,
14
+ "step": 10
15
+ },
16
+ {
17
+ "epoch": 0.03,
18
+ "learning_rate": 5.9999999999999995e-05,
19
+ "loss": 2.566,
20
+ "step": 20
21
+ },
22
+ {
23
+ "epoch": 0.05,
24
+ "learning_rate": 8.999999999999999e-05,
25
+ "loss": 2.2648,
26
+ "step": 30
27
+ },
28
+ {
29
+ "epoch": 0.07,
30
+ "learning_rate": 0.00011099999999999999,
31
+ "loss": 1.657,
32
+ "step": 40
33
+ },
34
+ {
35
+ "epoch": 0.08,
36
+ "learning_rate": 0.00014099999999999998,
37
+ "loss": 1.1599,
38
+ "step": 50
39
+ },
40
+ {
41
+ "epoch": 0.1,
42
+ "learning_rate": 0.00017099999999999998,
43
+ "loss": 0.9037,
44
+ "step": 60
45
+ },
46
+ {
47
+ "epoch": 0.12,
48
+ "learning_rate": 0.000201,
49
+ "loss": 0.8137,
50
+ "step": 70
51
+ },
52
+ {
53
+ "epoch": 0.13,
54
+ "learning_rate": 0.00023099999999999998,
55
+ "loss": 0.7827,
56
+ "step": 80
57
+ },
58
+ {
59
+ "epoch": 0.15,
60
+ "learning_rate": 0.000261,
61
+ "loss": 0.7554,
62
+ "step": 90
63
+ },
64
+ {
65
+ "epoch": 0.17,
66
+ "learning_rate": 0.00029099999999999997,
67
+ "loss": 0.7357,
68
+ "step": 100
69
+ },
70
+ {
71
+ "epoch": 0.18,
72
+ "learning_rate": 0.0002957831325301205,
73
+ "loss": 0.6893,
74
+ "step": 110
75
+ },
76
+ {
77
+ "epoch": 0.2,
78
+ "learning_rate": 0.00028975903614457827,
79
+ "loss": 0.6606,
80
+ "step": 120
81
+ },
82
+ {
83
+ "epoch": 0.22,
84
+ "learning_rate": 0.0002837349397590361,
85
+ "loss": 0.6506,
86
+ "step": 130
87
+ },
88
+ {
89
+ "epoch": 0.23,
90
+ "learning_rate": 0.00027771084337349395,
91
+ "loss": 0.6462,
92
+ "step": 140
93
+ },
94
+ {
95
+ "epoch": 0.25,
96
+ "learning_rate": 0.0002716867469879518,
97
+ "loss": 0.6315,
98
+ "step": 150
99
+ },
100
+ {
101
+ "epoch": 0.27,
102
+ "learning_rate": 0.0002656626506024096,
103
+ "loss": 0.6337,
104
+ "step": 160
105
+ },
106
+ {
107
+ "epoch": 0.28,
108
+ "learning_rate": 0.00025963855421686746,
109
+ "loss": 0.6223,
110
+ "step": 170
111
+ },
112
+ {
113
+ "epoch": 0.3,
114
+ "learning_rate": 0.00025361445783132525,
115
+ "loss": 0.6136,
116
+ "step": 180
117
+ },
118
+ {
119
+ "epoch": 0.32,
120
+ "learning_rate": 0.00024759036144578314,
121
+ "loss": 0.6198,
122
+ "step": 190
123
+ },
124
+ {
125
+ "epoch": 0.33,
126
+ "learning_rate": 0.00024156626506024095,
127
+ "loss": 0.6084,
128
+ "step": 200
129
+ },
130
+ {
131
+ "epoch": 0.33,
132
+ "eval_loss": 0.608456552028656,
133
+ "eval_runtime": 123.856,
134
+ "eval_samples_per_second": 16.148,
135
+ "eval_steps_per_second": 1.009,
136
+ "step": 200
137
+ }
138
+ ],
139
+ "max_steps": 598,
140
+ "num_train_epochs": 1,
141
+ "total_flos": 1.7194991763849216e+17,
142
+ "trial_name": null,
143
+ "trial_params": null
144
+ }
checkpoint-200/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84c05e6ae753d45577a258d454e2eb5c09dc6745e975434d69ba5432ffa022e7
3
+ size 3579
checkpoint-400/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a64a61a46978c6b08f923fbc3765ae8f2425a7f416cdf7d53ff324dbaafa95df
3
+ size 33661637
checkpoint-400/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:583547922fd1f6caffa9b89e8e9198da03a6edbb83afc3a4d197a14afebcfda6
3
+ size 16822989
checkpoint-400/rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b82e02b2fa07ca70b10b69c969d68a430593f680faba711e125cd22f71d22091
3
+ size 14583
checkpoint-400/rng_state_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad758f3149c3dc3140b14c686eb5e4c86f1100296833b4d78d25050a3488c2ba
3
+ size 14583
checkpoint-400/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f5554cdd695760baf91670b248e922967e34f8996685d064b22c06d043999e2
3
+ size 557
checkpoint-400/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1720e92837945289e11edece0320c6d9f5e447f776cad627cbb852f656661b20
3
+ size 627
checkpoint-400/trainer_state.json ADDED
@@ -0,0 +1,272 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.568750262260437,
3
+ "best_model_checkpoint": "./lora-alpaca/checkpoint-400",
4
+ "epoch": 0.6683375104427736,
5
+ "global_step": 400,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.02,
12
+ "learning_rate": 2.9999999999999997e-05,
13
+ "loss": 2.7003,
14
+ "step": 10
15
+ },
16
+ {
17
+ "epoch": 0.03,
18
+ "learning_rate": 5.9999999999999995e-05,
19
+ "loss": 2.566,
20
+ "step": 20
21
+ },
22
+ {
23
+ "epoch": 0.05,
24
+ "learning_rate": 8.999999999999999e-05,
25
+ "loss": 2.2648,
26
+ "step": 30
27
+ },
28
+ {
29
+ "epoch": 0.07,
30
+ "learning_rate": 0.00011099999999999999,
31
+ "loss": 1.657,
32
+ "step": 40
33
+ },
34
+ {
35
+ "epoch": 0.08,
36
+ "learning_rate": 0.00014099999999999998,
37
+ "loss": 1.1599,
38
+ "step": 50
39
+ },
40
+ {
41
+ "epoch": 0.1,
42
+ "learning_rate": 0.00017099999999999998,
43
+ "loss": 0.9037,
44
+ "step": 60
45
+ },
46
+ {
47
+ "epoch": 0.12,
48
+ "learning_rate": 0.000201,
49
+ "loss": 0.8137,
50
+ "step": 70
51
+ },
52
+ {
53
+ "epoch": 0.13,
54
+ "learning_rate": 0.00023099999999999998,
55
+ "loss": 0.7827,
56
+ "step": 80
57
+ },
58
+ {
59
+ "epoch": 0.15,
60
+ "learning_rate": 0.000261,
61
+ "loss": 0.7554,
62
+ "step": 90
63
+ },
64
+ {
65
+ "epoch": 0.17,
66
+ "learning_rate": 0.00029099999999999997,
67
+ "loss": 0.7357,
68
+ "step": 100
69
+ },
70
+ {
71
+ "epoch": 0.18,
72
+ "learning_rate": 0.0002957831325301205,
73
+ "loss": 0.6893,
74
+ "step": 110
75
+ },
76
+ {
77
+ "epoch": 0.2,
78
+ "learning_rate": 0.00028975903614457827,
79
+ "loss": 0.6606,
80
+ "step": 120
81
+ },
82
+ {
83
+ "epoch": 0.22,
84
+ "learning_rate": 0.0002837349397590361,
85
+ "loss": 0.6506,
86
+ "step": 130
87
+ },
88
+ {
89
+ "epoch": 0.23,
90
+ "learning_rate": 0.00027771084337349395,
91
+ "loss": 0.6462,
92
+ "step": 140
93
+ },
94
+ {
95
+ "epoch": 0.25,
96
+ "learning_rate": 0.0002716867469879518,
97
+ "loss": 0.6315,
98
+ "step": 150
99
+ },
100
+ {
101
+ "epoch": 0.27,
102
+ "learning_rate": 0.0002656626506024096,
103
+ "loss": 0.6337,
104
+ "step": 160
105
+ },
106
+ {
107
+ "epoch": 0.28,
108
+ "learning_rate": 0.00025963855421686746,
109
+ "loss": 0.6223,
110
+ "step": 170
111
+ },
112
+ {
113
+ "epoch": 0.3,
114
+ "learning_rate": 0.00025361445783132525,
115
+ "loss": 0.6136,
116
+ "step": 180
117
+ },
118
+ {
119
+ "epoch": 0.32,
120
+ "learning_rate": 0.00024759036144578314,
121
+ "loss": 0.6198,
122
+ "step": 190
123
+ },
124
+ {
125
+ "epoch": 0.33,
126
+ "learning_rate": 0.00024156626506024095,
127
+ "loss": 0.6084,
128
+ "step": 200
129
+ },
130
+ {
131
+ "epoch": 0.33,
132
+ "eval_loss": 0.608456552028656,
133
+ "eval_runtime": 123.856,
134
+ "eval_samples_per_second": 16.148,
135
+ "eval_steps_per_second": 1.009,
136
+ "step": 200
137
+ },
138
+ {
139
+ "epoch": 0.35,
140
+ "learning_rate": 0.00023554216867469876,
141
+ "loss": 0.6021,
142
+ "step": 210
143
+ },
144
+ {
145
+ "epoch": 0.37,
146
+ "learning_rate": 0.0002295180722891566,
147
+ "loss": 0.5949,
148
+ "step": 220
149
+ },
150
+ {
151
+ "epoch": 0.38,
152
+ "learning_rate": 0.00022349397590361444,
153
+ "loss": 0.5972,
154
+ "step": 230
155
+ },
156
+ {
157
+ "epoch": 0.4,
158
+ "learning_rate": 0.00021746987951807228,
159
+ "loss": 0.5922,
160
+ "step": 240
161
+ },
162
+ {
163
+ "epoch": 0.42,
164
+ "learning_rate": 0.0002114457831325301,
165
+ "loss": 0.5876,
166
+ "step": 250
167
+ },
168
+ {
169
+ "epoch": 0.43,
170
+ "learning_rate": 0.00020542168674698793,
171
+ "loss": 0.5788,
172
+ "step": 260
173
+ },
174
+ {
175
+ "epoch": 0.45,
176
+ "learning_rate": 0.0001993975903614458,
177
+ "loss": 0.5894,
178
+ "step": 270
179
+ },
180
+ {
181
+ "epoch": 0.47,
182
+ "learning_rate": 0.0001933734939759036,
183
+ "loss": 0.5877,
184
+ "step": 280
185
+ },
186
+ {
187
+ "epoch": 0.48,
188
+ "learning_rate": 0.00018734939759036142,
189
+ "loss": 0.5835,
190
+ "step": 290
191
+ },
192
+ {
193
+ "epoch": 0.5,
194
+ "learning_rate": 0.00018132530120481925,
195
+ "loss": 0.5791,
196
+ "step": 300
197
+ },
198
+ {
199
+ "epoch": 0.52,
200
+ "learning_rate": 0.00017530120481927712,
201
+ "loss": 0.5841,
202
+ "step": 310
203
+ },
204
+ {
205
+ "epoch": 0.53,
206
+ "learning_rate": 0.00016927710843373493,
207
+ "loss": 0.5728,
208
+ "step": 320
209
+ },
210
+ {
211
+ "epoch": 0.55,
212
+ "learning_rate": 0.00016325301204819274,
213
+ "loss": 0.569,
214
+ "step": 330
215
+ },
216
+ {
217
+ "epoch": 0.57,
218
+ "learning_rate": 0.00015722891566265058,
219
+ "loss": 0.5709,
220
+ "step": 340
221
+ },
222
+ {
223
+ "epoch": 0.58,
224
+ "learning_rate": 0.00015120481927710845,
225
+ "loss": 0.5762,
226
+ "step": 350
227
+ },
228
+ {
229
+ "epoch": 0.6,
230
+ "learning_rate": 0.00014518072289156626,
231
+ "loss": 0.5704,
232
+ "step": 360
233
+ },
234
+ {
235
+ "epoch": 0.62,
236
+ "learning_rate": 0.0001391566265060241,
237
+ "loss": 0.5661,
238
+ "step": 370
239
+ },
240
+ {
241
+ "epoch": 0.63,
242
+ "learning_rate": 0.00013313253012048193,
243
+ "loss": 0.5662,
244
+ "step": 380
245
+ },
246
+ {
247
+ "epoch": 0.65,
248
+ "learning_rate": 0.00012710843373493975,
249
+ "loss": 0.5674,
250
+ "step": 390
251
+ },
252
+ {
253
+ "epoch": 0.67,
254
+ "learning_rate": 0.00012108433734939758,
255
+ "loss": 0.5635,
256
+ "step": 400
257
+ },
258
+ {
259
+ "epoch": 0.67,
260
+ "eval_loss": 0.568750262260437,
261
+ "eval_runtime": 122.9061,
262
+ "eval_samples_per_second": 16.273,
263
+ "eval_steps_per_second": 1.017,
264
+ "step": 400
265
+ }
266
+ ],
267
+ "max_steps": 598,
268
+ "num_train_epochs": 1,
269
+ "total_flos": 3.4431112456647475e+17,
270
+ "trial_name": null,
271
+ "trial_params": null
272
+ }
checkpoint-400/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84c05e6ae753d45577a258d454e2eb5c09dc6745e975434d69ba5432ffa022e7
3
+ size 3579