initial model
Browse files- README.md +156 -0
- adapter_config.json +18 -0
- adapter_model.bin +3 -0
- checkpoint-200/optimizer.pt +3 -0
- checkpoint-200/pytorch_model.bin +3 -0
- checkpoint-200/rng_state_0.pth +3 -0
- checkpoint-200/rng_state_1.pth +3 -0
- checkpoint-200/scaler.pt +3 -0
- checkpoint-200/scheduler.pt +3 -0
- checkpoint-200/trainer_state.json +144 -0
- checkpoint-200/training_args.bin +3 -0
- checkpoint-400/optimizer.pt +3 -0
- checkpoint-400/pytorch_model.bin +3 -0
- checkpoint-400/rng_state_0.pth +3 -0
- checkpoint-400/rng_state_1.pth +3 -0
- checkpoint-400/scaler.pt +3 -0
- checkpoint-400/scheduler.pt +3 -0
- checkpoint-400/trainer_state.json +272 -0
- checkpoint-400/training_args.bin +3 -0
README.md
ADDED
@@ -0,0 +1,156 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## Setup Notes
|
2 |
+
|
3 |
+
For this model, a VM with 2 T4 GPUs was used.
|
4 |
+
|
5 |
+
To get the training to work on the 2 GPUs (utilize both GPUS simultaneously), the following command was used to initiate training.
|
6 |
+
|
7 |
+
WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path 'b-mc2/sql-create-context' --output_dir './lora-alpaca' --num_epochs 1 --micro_batch_size 16
|
8 |
+
|
9 |
+
Note 1. Micro batch size was increased from the default 4 to 16. Note that increasing it further is possible based on other training that has been performed. This was a first attempt.
|
10 |
+
|
11 |
+
Note 2. Output directory was initially lora-alpaca and then contents were moved to new folder when initializing git repository.
|
12 |
+
|
13 |
+
|
14 |
+
## Log
|
15 |
+
|
16 |
+
(sqltest) chrisdono@deep-learning-duo-t4-3:~/alpaca-lora$ WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/lla$
|
17 |
+
a-7b-hf' --data_path 'b-mc2/sql-create-context' --output_dir './lora-alpaca' --num_epochs 1 --micro_batch_size 16
|
18 |
+
WARNING:torch.distributed.run:
|
19 |
+
*****************************************
|
20 |
+
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your appli
|
21 |
+
cation as needed.
|
22 |
+
*****************************************
|
23 |
+
|
24 |
+
|
25 |
+
===================================BUG REPORT===================================
|
26 |
+
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
|
27 |
+
================================================================================
|
28 |
+
===================================BUG REPORT===================================
|
29 |
+
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
|
30 |
+
================================================================================
|
31 |
+
/opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path
|
32 |
+
s...
|
33 |
+
warn(msg)
|
34 |
+
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
|
35 |
+
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
|
36 |
+
CUDA SETUP: Detected CUDA version 113
|
37 |
+
CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
|
38 |
+
/opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path
|
39 |
+
s...
|
40 |
+
warn(msg)
|
41 |
+
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
|
42 |
+
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
|
43 |
+
CUDA SETUP: Detected CUDA version 113
|
44 |
+
CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
|
45 |
+
Training Alpaca-LoRA model with params:
|
46 |
+
base_model: decapoda-research/llama-7b-hf
|
47 |
+
data_path: b-mc2/sql-create-context
|
48 |
+
output_dir: ./lora-alpaca
|
49 |
+
batch_size: 128
|
50 |
+
micro_batch_size: 16
|
51 |
+
num_epochs: 1
|
52 |
+
learning_rate: 0.0003
|
53 |
+
cutoff_len: 256
|
54 |
+
val_set_size: 2000
|
55 |
+
lora_r: 8
|
56 |
+
lora_alpha: 16
|
57 |
+
lora_dropout: 0.05
|
58 |
+
lora_target_modules: ['q_proj', 'v_proj']
|
59 |
+
train_on_inputs: True
|
60 |
+
add_eos_token: False
|
61 |
+
group_by_length: False
|
62 |
+
wandb_project:
|
63 |
+
wandb_run_name:
|
64 |
+
wandb_watch:
|
65 |
+
wandb_log_model:
|
66 |
+
resume_from_checkpoint: False
|
67 |
+
prompt template: alpaca
|
68 |
+
|
69 |
+
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:24<00:00, 2.57s/it]
|
70 |
+
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:24<00:00, 2.57s/it]
|
71 |
+
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
|
72 |
+
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
|
73 |
+
The class this function is called from is 'LlamaTokenizer'.
|
74 |
+
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
|
75 |
+
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
|
76 |
+
The class this function is called from is 'LlamaTokenizer'.
|
77 |
+
Found cached dataset json (/home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e
|
78 |
+
233e6e)
|
79 |
+
0%| | 0/1 [00:00<?, ?it/s]
|
80 |
+
Found cached dataset json (/home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e
|
81 |
+
233e6e)
|
82 |
+
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 9.30it/s]
|
83 |
+
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 7.83it/s]
|
84 |
+
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
|
85 |
+
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
|
86 |
+
Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b
|
87 |
+
2dd7af1cf934bed8e233e6e/cache-5a5ac0bd39fc20e0.arrow and /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb5
|
88 |
+
50cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-782fec259d4b8f6a.arrow
|
89 |
+
Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b
|
90 |
+
2dd7af1cf934bed8e233e6e/cache-5a5ac0bd39fc20e0.arrow and /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb5
|
91 |
+
50cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-782fec259d4b8f6a.arrow
|
92 |
+
{'loss': 2.7003, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.02}
|
93 |
+
{'loss': 2.566, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.03}
|
94 |
+
{'loss': 2.2648, 'learning_rate': 8.999999999999999e-05, 'epoch': 0.05}
|
95 |
+
{'loss': 1.657, 'learning_rate': 0.00011099999999999999, 'epoch': 0.07}
|
96 |
+
{'loss': 1.1599, 'learning_rate': 0.00014099999999999998, 'epoch': 0.08}
|
97 |
+
{'loss': 0.9037, 'learning_rate': 0.00017099999999999998, 'epoch': 0.1}
|
98 |
+
{'loss': 0.8137, 'learning_rate': 0.000201, 'epoch': 0.12}
|
99 |
+
{'loss': 0.7827, 'learning_rate': 0.00023099999999999998, 'epoch': 0.13}
|
100 |
+
{'loss': 0.7554, 'learning_rate': 0.000261, 'epoch': 0.15}
|
101 |
+
{'loss': 0.7357, 'learning_rate': 0.00029099999999999997, 'epoch': 0.17}
|
102 |
+
{'loss': 0.6893, 'learning_rate': 0.0002957831325301205, 'epoch': 0.18}
|
103 |
+
{'loss': 0.6606, 'learning_rate': 0.00028975903614457827, 'epoch': 0.2}
|
104 |
+
{'loss': 0.6506, 'learning_rate': 0.0002837349397590361, 'epoch': 0.22}
|
105 |
+
{'loss': 0.6462, 'learning_rate': 0.00027771084337349395, 'epoch': 0.23} [215/1857]
|
106 |
+
{'loss': 0.6315, 'learning_rate': 0.0002716867469879518, 'epoch': 0.25}
|
107 |
+
{'loss': 0.6337, 'learning_rate': 0.0002656626506024096, 'epoch': 0.27}
|
108 |
+
{'loss': 0.6223, 'learning_rate': 0.00025963855421686746, 'epoch': 0.28}
|
109 |
+
{'loss': 0.6136, 'learning_rate': 0.00025361445783132525, 'epoch': 0.3}
|
110 |
+
{'loss': 0.6198, 'learning_rate': 0.00024759036144578314, 'epoch': 0.32}
|
111 |
+
{'loss': 0.6084, 'learning_rate': 0.00024156626506024095, 'epoch': 0.33}
|
112 |
+
{'eval_loss': 0.608456552028656, 'eval_runtime': 123.856, 'eval_samples_per_second': 16.148, 'eval_steps_per_second': 1.009, 'epoch': 0.33}
|
113 |
+
{'loss': 0.6021, 'learning_rate': 0.00023554216867469876, 'epoch': 0.35}
|
114 |
+
{'loss': 0.5949, 'learning_rate': 0.0002295180722891566, 'epoch': 0.37}
|
115 |
+
{'loss': 0.5972, 'learning_rate': 0.00022349397590361444, 'epoch': 0.38}
|
116 |
+
{'loss': 0.5922, 'learning_rate': 0.00021746987951807228, 'epoch': 0.4}
|
117 |
+
{'loss': 0.5876, 'learning_rate': 0.0002114457831325301, 'epoch': 0.42}
|
118 |
+
{'loss': 0.5788, 'learning_rate': 0.00020542168674698793, 'epoch': 0.43}
|
119 |
+
{'loss': 0.5894, 'learning_rate': 0.0001993975903614458, 'epoch': 0.45}
|
120 |
+
{'loss': 0.5877, 'learning_rate': 0.0001933734939759036, 'epoch': 0.47}
|
121 |
+
{'loss': 0.5835, 'learning_rate': 0.00018734939759036142, 'epoch': 0.48}
|
122 |
+
{'loss': 0.5791, 'learning_rate': 0.00018132530120481925, 'epoch': 0.5}
|
123 |
+
{'loss': 0.5841, 'learning_rate': 0.00017530120481927712, 'epoch': 0.52}
|
124 |
+
{'loss': 0.5728, 'learning_rate': 0.00016927710843373493, 'epoch': 0.53}
|
125 |
+
{'loss': 0.569, 'learning_rate': 0.00016325301204819274, 'epoch': 0.55}
|
126 |
+
{'loss': 0.5709, 'learning_rate': 0.00015722891566265058, 'epoch': 0.57}
|
127 |
+
{'loss': 0.5762, 'learning_rate': 0.00015120481927710845, 'epoch': 0.58}
|
128 |
+
{'loss': 0.5704, 'learning_rate': 0.00014518072289156626, 'epoch': 0.6}
|
129 |
+
{'loss': 0.5661, 'learning_rate': 0.0001391566265060241, 'epoch': 0.62}
|
130 |
+
{'loss': 0.5662, 'learning_rate': 0.00013313253012048193, 'epoch': 0.63}
|
131 |
+
{'loss': 0.5674, 'learning_rate': 0.00012710843373493975, 'epoch': 0.65}
|
132 |
+
{'loss': 0.5635, 'learning_rate': 0.00012108433734939758, 'epoch': 0.67}
|
133 |
+
{'eval_loss': 0.568750262260437, 'eval_runtime': 122.9061, 'eval_samples_per_second': 16.273, 'eval_steps_per_second': 1.017, 'epoch': 0.67}
|
134 |
+
{'loss': 0.5609, 'learning_rate': 0.00011506024096385541, 'epoch': 0.69}
|
135 |
+
{'loss': 0.5724, 'learning_rate': 0.00010903614457831325, 'epoch': 0.7}
|
136 |
+
{'loss': 0.5603, 'learning_rate': 0.00010301204819277107, 'epoch': 0.72}
|
137 |
+
{'loss': 0.5599, 'learning_rate': 9.698795180722891e-05, 'epoch': 0.74}
|
138 |
+
{'loss': 0.5655, 'learning_rate': 9.096385542168674e-05, 'epoch': 0.75}
|
139 |
+
{'loss': 0.5578, 'learning_rate': 8.493975903614457e-05, 'epoch': 0.77}
|
140 |
+
{'loss': 0.5577, 'learning_rate': 7.89156626506024e-05, 'epoch': 0.79}
|
141 |
+
{'loss': 0.5606, 'learning_rate': 7.289156626506024e-05, 'epoch': 0.8}
|
142 |
+
{'loss': 0.5496, 'learning_rate': 6.686746987951806e-05, 'epoch': 0.82}
|
143 |
+
{'loss': 0.5635, 'learning_rate': 6.08433734939759e-05, 'epoch': 0.84}
|
144 |
+
{'loss': 0.5522, 'learning_rate': 5.481927710843373e-05, 'epoch': 0.85}
|
145 |
+
{'loss': 0.5572, 'learning_rate': 4.879518072289156e-05, 'epoch': 0.87}
|
146 |
+
{'loss': 0.5454, 'learning_rate': 4.2771084337349395e-05, 'epoch': 0.89}
|
147 |
+
{'loss': 0.5485, 'learning_rate': 3.6746987951807227e-05, 'epoch': 0.9}
|
148 |
+
{'loss': 0.5592, 'learning_rate': 3.072289156626506e-05, 'epoch': 0.92}
|
149 |
+
{'loss': 0.5499, 'learning_rate': 2.469879518072289e-05, 'epoch': 0.94}
|
150 |
+
{'loss': 0.55, 'learning_rate': 1.867469879518072e-05, 'epoch': 0.95}
|
151 |
+
{'loss': 0.5511, 'learning_rate': 1.2650602409638553e-05, 'epoch': 0.97}
|
152 |
+
{'loss': 0.5531, 'learning_rate': 6.626506024096385e-06, 'epoch': 0.99}
|
153 |
+
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 598/598 [4:45:30<00:00, 27.59s/it]
|
154 |
+
{'train_runtime': 17131.1027, 'train_samples_per_second': 4.47, 'train_steps_per_second': 0.035, 'train_loss': 0.7246327424129116, 'epoch': 1.0}
|
155 |
+
100%|████████████████████████████████████████████████████████��█████████████████████████████████████████████████████████████████████████████████████████████| 598/598 [4:45:30<00:00, 28.65s/it]
|
156 |
+
|
adapter_config.json
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"base_model_name_or_path": "decapoda-research/llama-7b-hf",
|
3 |
+
"bias": "none",
|
4 |
+
"enable_lora": null,
|
5 |
+
"fan_in_fan_out": false,
|
6 |
+
"inference_mode": true,
|
7 |
+
"lora_alpha": 16,
|
8 |
+
"lora_dropout": 0.05,
|
9 |
+
"merge_weights": false,
|
10 |
+
"modules_to_save": null,
|
11 |
+
"peft_type": "LORA",
|
12 |
+
"r": 8,
|
13 |
+
"target_modules": [
|
14 |
+
"q_proj",
|
15 |
+
"v_proj"
|
16 |
+
],
|
17 |
+
"task_type": "CAUSAL_LM"
|
18 |
+
}
|
adapter_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e42555cfb90f4ae1dea4f60b1fabeb4b0f835742f7ce5cb5db85a88b9bd56ab1
|
3 |
+
size 16822989
|
checkpoint-200/optimizer.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:30aed971820530f6c2286b67813628eb04406f7a8ff8684a5c487d4703ff526a
|
3 |
+
size 33661637
|
checkpoint-200/pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:116aaa0e06aeb94d0275cb05d94d3de6ffa0777157bdb82c747abb8d64ce1e9e
|
3 |
+
size 16822989
|
checkpoint-200/rng_state_0.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e16e9f14f9e54b23b5659b8c0c8de15af3106d2fd7100c84972de9f0edddff5a
|
3 |
+
size 14583
|
checkpoint-200/rng_state_1.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:771e4f2aee11e3618b4bdba3329b0dd8fffdec0bc052288b1bac58fc0f1db62b
|
3 |
+
size 14583
|
checkpoint-200/scaler.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:915e54e5670dd4bd7730bf2512f0efeebe5737f3917cb2161e14f0fc5045e265
|
3 |
+
size 557
|
checkpoint-200/scheduler.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2c3951795851fd7473327989d296caf1d37edcfc8f96fd0f7135134b43e8e969
|
3 |
+
size 627
|
checkpoint-200/trainer_state.json
ADDED
@@ -0,0 +1,144 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"best_metric": 0.608456552028656,
|
3 |
+
"best_model_checkpoint": "./lora-alpaca/checkpoint-200",
|
4 |
+
"epoch": 0.3341687552213868,
|
5 |
+
"global_step": 200,
|
6 |
+
"is_hyper_param_search": false,
|
7 |
+
"is_local_process_zero": true,
|
8 |
+
"is_world_process_zero": true,
|
9 |
+
"log_history": [
|
10 |
+
{
|
11 |
+
"epoch": 0.02,
|
12 |
+
"learning_rate": 2.9999999999999997e-05,
|
13 |
+
"loss": 2.7003,
|
14 |
+
"step": 10
|
15 |
+
},
|
16 |
+
{
|
17 |
+
"epoch": 0.03,
|
18 |
+
"learning_rate": 5.9999999999999995e-05,
|
19 |
+
"loss": 2.566,
|
20 |
+
"step": 20
|
21 |
+
},
|
22 |
+
{
|
23 |
+
"epoch": 0.05,
|
24 |
+
"learning_rate": 8.999999999999999e-05,
|
25 |
+
"loss": 2.2648,
|
26 |
+
"step": 30
|
27 |
+
},
|
28 |
+
{
|
29 |
+
"epoch": 0.07,
|
30 |
+
"learning_rate": 0.00011099999999999999,
|
31 |
+
"loss": 1.657,
|
32 |
+
"step": 40
|
33 |
+
},
|
34 |
+
{
|
35 |
+
"epoch": 0.08,
|
36 |
+
"learning_rate": 0.00014099999999999998,
|
37 |
+
"loss": 1.1599,
|
38 |
+
"step": 50
|
39 |
+
},
|
40 |
+
{
|
41 |
+
"epoch": 0.1,
|
42 |
+
"learning_rate": 0.00017099999999999998,
|
43 |
+
"loss": 0.9037,
|
44 |
+
"step": 60
|
45 |
+
},
|
46 |
+
{
|
47 |
+
"epoch": 0.12,
|
48 |
+
"learning_rate": 0.000201,
|
49 |
+
"loss": 0.8137,
|
50 |
+
"step": 70
|
51 |
+
},
|
52 |
+
{
|
53 |
+
"epoch": 0.13,
|
54 |
+
"learning_rate": 0.00023099999999999998,
|
55 |
+
"loss": 0.7827,
|
56 |
+
"step": 80
|
57 |
+
},
|
58 |
+
{
|
59 |
+
"epoch": 0.15,
|
60 |
+
"learning_rate": 0.000261,
|
61 |
+
"loss": 0.7554,
|
62 |
+
"step": 90
|
63 |
+
},
|
64 |
+
{
|
65 |
+
"epoch": 0.17,
|
66 |
+
"learning_rate": 0.00029099999999999997,
|
67 |
+
"loss": 0.7357,
|
68 |
+
"step": 100
|
69 |
+
},
|
70 |
+
{
|
71 |
+
"epoch": 0.18,
|
72 |
+
"learning_rate": 0.0002957831325301205,
|
73 |
+
"loss": 0.6893,
|
74 |
+
"step": 110
|
75 |
+
},
|
76 |
+
{
|
77 |
+
"epoch": 0.2,
|
78 |
+
"learning_rate": 0.00028975903614457827,
|
79 |
+
"loss": 0.6606,
|
80 |
+
"step": 120
|
81 |
+
},
|
82 |
+
{
|
83 |
+
"epoch": 0.22,
|
84 |
+
"learning_rate": 0.0002837349397590361,
|
85 |
+
"loss": 0.6506,
|
86 |
+
"step": 130
|
87 |
+
},
|
88 |
+
{
|
89 |
+
"epoch": 0.23,
|
90 |
+
"learning_rate": 0.00027771084337349395,
|
91 |
+
"loss": 0.6462,
|
92 |
+
"step": 140
|
93 |
+
},
|
94 |
+
{
|
95 |
+
"epoch": 0.25,
|
96 |
+
"learning_rate": 0.0002716867469879518,
|
97 |
+
"loss": 0.6315,
|
98 |
+
"step": 150
|
99 |
+
},
|
100 |
+
{
|
101 |
+
"epoch": 0.27,
|
102 |
+
"learning_rate": 0.0002656626506024096,
|
103 |
+
"loss": 0.6337,
|
104 |
+
"step": 160
|
105 |
+
},
|
106 |
+
{
|
107 |
+
"epoch": 0.28,
|
108 |
+
"learning_rate": 0.00025963855421686746,
|
109 |
+
"loss": 0.6223,
|
110 |
+
"step": 170
|
111 |
+
},
|
112 |
+
{
|
113 |
+
"epoch": 0.3,
|
114 |
+
"learning_rate": 0.00025361445783132525,
|
115 |
+
"loss": 0.6136,
|
116 |
+
"step": 180
|
117 |
+
},
|
118 |
+
{
|
119 |
+
"epoch": 0.32,
|
120 |
+
"learning_rate": 0.00024759036144578314,
|
121 |
+
"loss": 0.6198,
|
122 |
+
"step": 190
|
123 |
+
},
|
124 |
+
{
|
125 |
+
"epoch": 0.33,
|
126 |
+
"learning_rate": 0.00024156626506024095,
|
127 |
+
"loss": 0.6084,
|
128 |
+
"step": 200
|
129 |
+
},
|
130 |
+
{
|
131 |
+
"epoch": 0.33,
|
132 |
+
"eval_loss": 0.608456552028656,
|
133 |
+
"eval_runtime": 123.856,
|
134 |
+
"eval_samples_per_second": 16.148,
|
135 |
+
"eval_steps_per_second": 1.009,
|
136 |
+
"step": 200
|
137 |
+
}
|
138 |
+
],
|
139 |
+
"max_steps": 598,
|
140 |
+
"num_train_epochs": 1,
|
141 |
+
"total_flos": 1.7194991763849216e+17,
|
142 |
+
"trial_name": null,
|
143 |
+
"trial_params": null
|
144 |
+
}
|
checkpoint-200/training_args.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:84c05e6ae753d45577a258d454e2eb5c09dc6745e975434d69ba5432ffa022e7
|
3 |
+
size 3579
|
checkpoint-400/optimizer.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a64a61a46978c6b08f923fbc3765ae8f2425a7f416cdf7d53ff324dbaafa95df
|
3 |
+
size 33661637
|
checkpoint-400/pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:583547922fd1f6caffa9b89e8e9198da03a6edbb83afc3a4d197a14afebcfda6
|
3 |
+
size 16822989
|
checkpoint-400/rng_state_0.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b82e02b2fa07ca70b10b69c969d68a430593f680faba711e125cd22f71d22091
|
3 |
+
size 14583
|
checkpoint-400/rng_state_1.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ad758f3149c3dc3140b14c686eb5e4c86f1100296833b4d78d25050a3488c2ba
|
3 |
+
size 14583
|
checkpoint-400/scaler.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4f5554cdd695760baf91670b248e922967e34f8996685d064b22c06d043999e2
|
3 |
+
size 557
|
checkpoint-400/scheduler.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1720e92837945289e11edece0320c6d9f5e447f776cad627cbb852f656661b20
|
3 |
+
size 627
|
checkpoint-400/trainer_state.json
ADDED
@@ -0,0 +1,272 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"best_metric": 0.568750262260437,
|
3 |
+
"best_model_checkpoint": "./lora-alpaca/checkpoint-400",
|
4 |
+
"epoch": 0.6683375104427736,
|
5 |
+
"global_step": 400,
|
6 |
+
"is_hyper_param_search": false,
|
7 |
+
"is_local_process_zero": true,
|
8 |
+
"is_world_process_zero": true,
|
9 |
+
"log_history": [
|
10 |
+
{
|
11 |
+
"epoch": 0.02,
|
12 |
+
"learning_rate": 2.9999999999999997e-05,
|
13 |
+
"loss": 2.7003,
|
14 |
+
"step": 10
|
15 |
+
},
|
16 |
+
{
|
17 |
+
"epoch": 0.03,
|
18 |
+
"learning_rate": 5.9999999999999995e-05,
|
19 |
+
"loss": 2.566,
|
20 |
+
"step": 20
|
21 |
+
},
|
22 |
+
{
|
23 |
+
"epoch": 0.05,
|
24 |
+
"learning_rate": 8.999999999999999e-05,
|
25 |
+
"loss": 2.2648,
|
26 |
+
"step": 30
|
27 |
+
},
|
28 |
+
{
|
29 |
+
"epoch": 0.07,
|
30 |
+
"learning_rate": 0.00011099999999999999,
|
31 |
+
"loss": 1.657,
|
32 |
+
"step": 40
|
33 |
+
},
|
34 |
+
{
|
35 |
+
"epoch": 0.08,
|
36 |
+
"learning_rate": 0.00014099999999999998,
|
37 |
+
"loss": 1.1599,
|
38 |
+
"step": 50
|
39 |
+
},
|
40 |
+
{
|
41 |
+
"epoch": 0.1,
|
42 |
+
"learning_rate": 0.00017099999999999998,
|
43 |
+
"loss": 0.9037,
|
44 |
+
"step": 60
|
45 |
+
},
|
46 |
+
{
|
47 |
+
"epoch": 0.12,
|
48 |
+
"learning_rate": 0.000201,
|
49 |
+
"loss": 0.8137,
|
50 |
+
"step": 70
|
51 |
+
},
|
52 |
+
{
|
53 |
+
"epoch": 0.13,
|
54 |
+
"learning_rate": 0.00023099999999999998,
|
55 |
+
"loss": 0.7827,
|
56 |
+
"step": 80
|
57 |
+
},
|
58 |
+
{
|
59 |
+
"epoch": 0.15,
|
60 |
+
"learning_rate": 0.000261,
|
61 |
+
"loss": 0.7554,
|
62 |
+
"step": 90
|
63 |
+
},
|
64 |
+
{
|
65 |
+
"epoch": 0.17,
|
66 |
+
"learning_rate": 0.00029099999999999997,
|
67 |
+
"loss": 0.7357,
|
68 |
+
"step": 100
|
69 |
+
},
|
70 |
+
{
|
71 |
+
"epoch": 0.18,
|
72 |
+
"learning_rate": 0.0002957831325301205,
|
73 |
+
"loss": 0.6893,
|
74 |
+
"step": 110
|
75 |
+
},
|
76 |
+
{
|
77 |
+
"epoch": 0.2,
|
78 |
+
"learning_rate": 0.00028975903614457827,
|
79 |
+
"loss": 0.6606,
|
80 |
+
"step": 120
|
81 |
+
},
|
82 |
+
{
|
83 |
+
"epoch": 0.22,
|
84 |
+
"learning_rate": 0.0002837349397590361,
|
85 |
+
"loss": 0.6506,
|
86 |
+
"step": 130
|
87 |
+
},
|
88 |
+
{
|
89 |
+
"epoch": 0.23,
|
90 |
+
"learning_rate": 0.00027771084337349395,
|
91 |
+
"loss": 0.6462,
|
92 |
+
"step": 140
|
93 |
+
},
|
94 |
+
{
|
95 |
+
"epoch": 0.25,
|
96 |
+
"learning_rate": 0.0002716867469879518,
|
97 |
+
"loss": 0.6315,
|
98 |
+
"step": 150
|
99 |
+
},
|
100 |
+
{
|
101 |
+
"epoch": 0.27,
|
102 |
+
"learning_rate": 0.0002656626506024096,
|
103 |
+
"loss": 0.6337,
|
104 |
+
"step": 160
|
105 |
+
},
|
106 |
+
{
|
107 |
+
"epoch": 0.28,
|
108 |
+
"learning_rate": 0.00025963855421686746,
|
109 |
+
"loss": 0.6223,
|
110 |
+
"step": 170
|
111 |
+
},
|
112 |
+
{
|
113 |
+
"epoch": 0.3,
|
114 |
+
"learning_rate": 0.00025361445783132525,
|
115 |
+
"loss": 0.6136,
|
116 |
+
"step": 180
|
117 |
+
},
|
118 |
+
{
|
119 |
+
"epoch": 0.32,
|
120 |
+
"learning_rate": 0.00024759036144578314,
|
121 |
+
"loss": 0.6198,
|
122 |
+
"step": 190
|
123 |
+
},
|
124 |
+
{
|
125 |
+
"epoch": 0.33,
|
126 |
+
"learning_rate": 0.00024156626506024095,
|
127 |
+
"loss": 0.6084,
|
128 |
+
"step": 200
|
129 |
+
},
|
130 |
+
{
|
131 |
+
"epoch": 0.33,
|
132 |
+
"eval_loss": 0.608456552028656,
|
133 |
+
"eval_runtime": 123.856,
|
134 |
+
"eval_samples_per_second": 16.148,
|
135 |
+
"eval_steps_per_second": 1.009,
|
136 |
+
"step": 200
|
137 |
+
},
|
138 |
+
{
|
139 |
+
"epoch": 0.35,
|
140 |
+
"learning_rate": 0.00023554216867469876,
|
141 |
+
"loss": 0.6021,
|
142 |
+
"step": 210
|
143 |
+
},
|
144 |
+
{
|
145 |
+
"epoch": 0.37,
|
146 |
+
"learning_rate": 0.0002295180722891566,
|
147 |
+
"loss": 0.5949,
|
148 |
+
"step": 220
|
149 |
+
},
|
150 |
+
{
|
151 |
+
"epoch": 0.38,
|
152 |
+
"learning_rate": 0.00022349397590361444,
|
153 |
+
"loss": 0.5972,
|
154 |
+
"step": 230
|
155 |
+
},
|
156 |
+
{
|
157 |
+
"epoch": 0.4,
|
158 |
+
"learning_rate": 0.00021746987951807228,
|
159 |
+
"loss": 0.5922,
|
160 |
+
"step": 240
|
161 |
+
},
|
162 |
+
{
|
163 |
+
"epoch": 0.42,
|
164 |
+
"learning_rate": 0.0002114457831325301,
|
165 |
+
"loss": 0.5876,
|
166 |
+
"step": 250
|
167 |
+
},
|
168 |
+
{
|
169 |
+
"epoch": 0.43,
|
170 |
+
"learning_rate": 0.00020542168674698793,
|
171 |
+
"loss": 0.5788,
|
172 |
+
"step": 260
|
173 |
+
},
|
174 |
+
{
|
175 |
+
"epoch": 0.45,
|
176 |
+
"learning_rate": 0.0001993975903614458,
|
177 |
+
"loss": 0.5894,
|
178 |
+
"step": 270
|
179 |
+
},
|
180 |
+
{
|
181 |
+
"epoch": 0.47,
|
182 |
+
"learning_rate": 0.0001933734939759036,
|
183 |
+
"loss": 0.5877,
|
184 |
+
"step": 280
|
185 |
+
},
|
186 |
+
{
|
187 |
+
"epoch": 0.48,
|
188 |
+
"learning_rate": 0.00018734939759036142,
|
189 |
+
"loss": 0.5835,
|
190 |
+
"step": 290
|
191 |
+
},
|
192 |
+
{
|
193 |
+
"epoch": 0.5,
|
194 |
+
"learning_rate": 0.00018132530120481925,
|
195 |
+
"loss": 0.5791,
|
196 |
+
"step": 300
|
197 |
+
},
|
198 |
+
{
|
199 |
+
"epoch": 0.52,
|
200 |
+
"learning_rate": 0.00017530120481927712,
|
201 |
+
"loss": 0.5841,
|
202 |
+
"step": 310
|
203 |
+
},
|
204 |
+
{
|
205 |
+
"epoch": 0.53,
|
206 |
+
"learning_rate": 0.00016927710843373493,
|
207 |
+
"loss": 0.5728,
|
208 |
+
"step": 320
|
209 |
+
},
|
210 |
+
{
|
211 |
+
"epoch": 0.55,
|
212 |
+
"learning_rate": 0.00016325301204819274,
|
213 |
+
"loss": 0.569,
|
214 |
+
"step": 330
|
215 |
+
},
|
216 |
+
{
|
217 |
+
"epoch": 0.57,
|
218 |
+
"learning_rate": 0.00015722891566265058,
|
219 |
+
"loss": 0.5709,
|
220 |
+
"step": 340
|
221 |
+
},
|
222 |
+
{
|
223 |
+
"epoch": 0.58,
|
224 |
+
"learning_rate": 0.00015120481927710845,
|
225 |
+
"loss": 0.5762,
|
226 |
+
"step": 350
|
227 |
+
},
|
228 |
+
{
|
229 |
+
"epoch": 0.6,
|
230 |
+
"learning_rate": 0.00014518072289156626,
|
231 |
+
"loss": 0.5704,
|
232 |
+
"step": 360
|
233 |
+
},
|
234 |
+
{
|
235 |
+
"epoch": 0.62,
|
236 |
+
"learning_rate": 0.0001391566265060241,
|
237 |
+
"loss": 0.5661,
|
238 |
+
"step": 370
|
239 |
+
},
|
240 |
+
{
|
241 |
+
"epoch": 0.63,
|
242 |
+
"learning_rate": 0.00013313253012048193,
|
243 |
+
"loss": 0.5662,
|
244 |
+
"step": 380
|
245 |
+
},
|
246 |
+
{
|
247 |
+
"epoch": 0.65,
|
248 |
+
"learning_rate": 0.00012710843373493975,
|
249 |
+
"loss": 0.5674,
|
250 |
+
"step": 390
|
251 |
+
},
|
252 |
+
{
|
253 |
+
"epoch": 0.67,
|
254 |
+
"learning_rate": 0.00012108433734939758,
|
255 |
+
"loss": 0.5635,
|
256 |
+
"step": 400
|
257 |
+
},
|
258 |
+
{
|
259 |
+
"epoch": 0.67,
|
260 |
+
"eval_loss": 0.568750262260437,
|
261 |
+
"eval_runtime": 122.9061,
|
262 |
+
"eval_samples_per_second": 16.273,
|
263 |
+
"eval_steps_per_second": 1.017,
|
264 |
+
"step": 400
|
265 |
+
}
|
266 |
+
],
|
267 |
+
"max_steps": 598,
|
268 |
+
"num_train_epochs": 1,
|
269 |
+
"total_flos": 3.4431112456647475e+17,
|
270 |
+
"trial_name": null,
|
271 |
+
"trial_params": null
|
272 |
+
}
|
checkpoint-400/training_args.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:84c05e6ae753d45577a258d454e2eb5c09dc6745e975434d69ba5432ffa022e7
|
3 |
+
size 3579
|