chrisdono commited on
Commit
e6bf7a9
β€’
1 Parent(s): 5dcd322
README.md CHANGED
@@ -6,186 +6,31 @@ Note 1. Output directory was initially lora-alpaca and then contents were moved
6
 
7
 
8
  ## Log
9
- (sqltest) chrisdono@deep-learning-duo-t4-3:~/alpaca-lora$ WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/llam
10
- a-7b-hf' --data_path 'spider' --output_dir './lora-alpaca' --num_epochs 3 --batch_size 32 --micro_batch_size 16 --learning_rate '1e-4'
11
- WARNING:torch.distributed.run:
12
- *****************************************
13
- Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your appli
14
- cation as needed.
15
- *****************************************
16
-
17
-
18
- ===================================BUG REPORT===================================
19
- Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
20
- ================================================================================
21
- ===================================BUG REPORT===================================
22
- Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
23
- ================================================================================
24
- /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path
25
- s...
26
- warn(msg)
27
- /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path
28
- s...
29
- warn(msg)
30
- CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
31
- CUDA SETUP: Highest compute capability among GPUs detected: 7.5
32
- CUDA SETUP: Detected CUDA version 113
33
- CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
34
- CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
35
- CUDA SETUP: Highest compute capability among GPUs detected: 7.5
36
- CUDA SETUP: Detected CUDA version 113
37
- CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
38
- Training Alpaca-LoRA model with params:
39
- base_model: decapoda-research/llama-7b-hf
40
- data_path: spider
41
- output_dir: ./lora-alpaca
42
- batch_size: 32
43
- micro_batch_size: 16
44
- num_epochs: 3
45
- learning_rate: 0.0001
46
- cutoff_len: 256
47
- val_set_size: 2000
48
- lora_r: 8
49
- lora_alpha: 16
50
- lora_dropout: 0.05
51
- lora_target_modules: ['q_proj', 'v_proj']
52
- train_on_inputs: True
53
- add_eos_token: False
54
- group_by_length: False
55
- wandb_project:
56
- wandb_run_name:
57
- wandb_watch:
58
- wandb_log_model:
59
- resume_from_checkpoint: False
60
- prompt template: alpaca
61
 
62
- Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 33/33 [01:19<00:00, 2.42s/it]
63
- Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 33/33 [01:19<00:00, 2.42s/it]
64
- The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
65
- The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
66
- The class this function is called from is 'LlamaTokenizer'.
67
- The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
68
- The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
69
- The class this function is called from is 'LlamaTokenizer'.
70
- Found cached dataset spider (/home/chrisdono/.cache/huggingface/datasets/spider/spider/1.0.0/4e5143d825a3895451569c8b9b55432b91a4bc2d04d390376c950837f4680daa)
71
- 0%| | 0/2 [00:00<?, ?it/s]
72
- Found cached dataset spider (/home/chrisdono/.cache/huggingface/datasets/spider/spider/1.0.0/4e5143d825a3895451569c8b9b55432b91a4bc2d04d390376c950837f4680daa)
73
- 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆοΏ½οΏ½β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 113.71it/s]
74
- 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 56.18it/s]
75
- Found cached dataset csv (/home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)
76
- 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 214.92it/s]
77
- trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
78
- Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cac
79
- he-b310cf91933dea79.arrow and /home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-9632dc43aab
80
- 73df2.arrow
81
- Found cached dataset csv (/home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)
82
- 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 787.74it/s]
83
- trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
84
- Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cac
85
- he-b310cf91933dea79.arrow and /home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-9632dc43aab
86
- 73df2.arrow
87
- TRAIN DATA
88
- {'Unnamed: 0': 2621, 'db_id': 'inn_1', 'query': 'SELECT decor , avg(basePrice) , min(basePrice) FROM Rooms GROUP BY decor;', 'question': 'What is the average minimum and price of the rooms f
89
- or each different decor.', 'query_toks': "['SELECT' 'decor' ',' 'avg' '(' 'basePrice' ')' ',' 'min' '(' 'basePrice'\n ')' 'FROM' 'Rooms' 'GROUP' 'BY' 'decor' ';']", 'query_toks_no_value': "['
90
- select' 'decor' ',' 'avg' '(' 'baseprice' ')' ',' 'min' '(' 'baseprice'\n ')' 'from' 'rooms' 'group' 'by' 'decor']", 'question_toks': "['What' 'is' 'the' 'average' 'minimum' 'and' 'price' 'of
91
- ' 'the' 'rooms'\n 'for' 'each' 'different' 'decor' '.']", 'db_context': "['room id', 'room name', 'beds', 'bed type', 'max occupancy', 'base price', 'decor', 'code', 'room', 'check in', 'chec
92
- k out', 'rate', 'last name', 'first name', 'adults', 'kids']", 'input_ids': [0, 13866, 338, 385, 15278, 393, 16612, 263, 3414, 29892, 3300, 2859, 411, 385, 1881, 393, 8128, 4340, 3030, 29889,
93
- 14350, 263, 2933, 393, 7128, 2486, 1614, 2167, 278, 2009, 29889, 13, 13, 2277, 29937, 2799, 4080, 29901, 13, 5618, 338, 278, 6588, 9212, 322, 8666, 310, 278, 19600, 363, 1269, 1422, 10200, 2
94
- 9889, 13, 13, 2277, 29937, 10567, 29901, 13, 1839, 8345, 1178, 742, 525, 8345, 1024, 742, 525, 2580, 29879, 742, 525, 2580, 1134, 742, 525, 3317, 6919, 6906, 742, 525, 3188, 8666, 742, 525, 1
95
- 9557, 742, 525, 401, 742, 525, 8345, 742, 525, 3198, 297, 742, 525, 3198, 714, 742, 525, 10492, 742, 525, 4230, 1024, 742, 525, 4102, 1024, 742, 525, 328, 499, 29879, 742, 525, 29895, 4841, 2
96
- 033, 13, 13, 2277, 29937, 13291, 29901, 13, 6404, 10200, 1919, 1029, 29887, 29898, 3188, 13026, 29897, 1919, 29871, 1375, 29898, 3188, 13026, 29897, 3895, 1528, 4835, 15345, 6770, 10200, 2993
97
- 6, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
98
- 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
99
- 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [0, 13866, 338, 385, 15278, 393, 16612, 263, 3414, 29892, 3300, 2859, 411,
100
- 385, 1881, 393, 8128, 4340, 3030, 29889, 14350, 263, 2933, 393, 7128, 2486, 1614, 2167, 278, 2009, 29889, 13, 13, 2277, 29937, 2799, 4080, 29901, 13, 5618, 338, 278, 6588, 9212, 322, 8666, 3
101
- 10, 278, 19600, 363, 1269, 1422, 10200, 29889, 13, 13, 2277, 29937, 10567, 29901, 13, 1839, 8345, 1178, 742, 525, 8345, 1024, 742, 525, 2580, 29879, 742, 525, 2580, 1134, 742, 525, 3317, 6919
102
- , 6906, 742, 525, 3188, 8666, 742, 525, 19557, 742, 525, 401, 742, 525, 8345, 742, 525, 3198, 297, 742, 525, 3198, 714, 742, 525, 10492, 742, 525, 4230, 1024, 742, 525, 4102, 1024, 742, 525,
103
- 328, 499, 29879, 742, 525, 29895, 4841, 2033, 13, 13, 2277, 29937, 13291, 29901, 13, 6404, 10200, 1919, 1029, 29887, 29898, 3188, 13026, 29897, 1919, 29871, 1375, 29898, 3188, 13026, 29897, 3
104
- 895, 1528, 4835, 15345, 6770, 10200, 29936, 0]}
105
- TRAIN DATA
106
- {'Unnamed: 0': 4767, 'db_id': 'department_store', 'query': 'SELECT product_id FROM Order_Items GROUP BY product_id HAVING count(*) > 3 UNION SELECT product_id FROM Product_Suppliers GROUP B
107
- Y product_id HAVING sum(total_amount_purchased) > 80000', 'question': 'Return the ids of all products that were ordered more than three times or supplied more than 80000.', 'query_toks': "[
108
- 'SELECT' 'product_id' 'FROM' 'Order_Items' 'GROUP' 'BY' 'product_id'\n 'HAVING' 'count' '(' '*' ')' '>' '3' 'UNION' 'SELECT' 'product_id' 'FROM'\n 'Product_Suppliers' 'GROUP' 'BY' 'product_id
109
- ' 'HAVING' 'sum' '('\n 'total_amount_purchased' ')' '>' '80000']", 'query_toks_no_value': "['select' 'product_id' 'from' 'order_items' 'group' 'by' 'product_id'\n 'having' 'count' '(' '*' ')'
110
- '>' 'value' 'union' 'select' 'product_id'\n 'from' 'product_suppliers' 'group' 'by' 'product_id' 'having' 'sum' '('\n 'total_amount_purchased' ')' '>' 'value']", 'question_toks': "['Return'
111
- 'the' 'ids' 'of' 'all' 'products' 'that' 'were' 'ordered' 'more'\n 'than' 'three' 'times' 'or' 'supplied' 'more' 'than' '80000' '.']", 'db_context': "['address id', 'address details', 'staff
112
- id', 'staff gender', 'staff name', 'supplier id', 'supplier name', 'supplier phone', 'department store chain id', 'department store chain name', 'customer id', 'payment method code', 'custome
113
- r code', 'customer name', 'customer address', 'customer phone', 'customer email', 'product id', 'product type code', 'product name', 'product price', 'supplier id', 'address id', 'date from',
114
- 'date to', 'customer id', 'address id', 'date from', 'date to', 'order id', 'customer id', 'order status code', 'order date', 'department store id', 'department store chain id', 'store name'
115
- , 'store address', 'store phone', 'store email', 'department id', 'department store id', 'department name', 'order item id', 'order id', 'product id', 'product id', 'supplier id', 'date suppl
116
- ied from', 'date supplied to', 'total amount purchased', 'total value purchased', 'staff id', 'department id', 'date assigned from', 'job title code', 'date assigned to']", 'input_ids': [0, 1
117
- 3866, 338, 385, 15278, 393, 16612, 263, 3414, 29892, 3300, 2859, 411, 385, 1881, 393, 8128, 4340, 3030, 29889, 14350, 263, 2933, 393, 7128, 2486, 1614, 2167, 278, 2009, 29889, 13, 13, 2277, 2
118
- 9937, 2799, 4080, 29901, 13, 11609, 278, 18999, 310, 599, 9316, 393, 892, 10372, 901, 1135, 2211, 3064, 470, 19056, 901, 1135, 29871, 29947, 29900, 29900, 29900, 29900, 29889, 13, 13, 2277, 2
119
- 9937, 10567, 29901, 13, 1839, 7328, 1178, 742, 525, 7328, 4902, 742, 525, 303, 3470, 1178, 742, 525, 303, 3470, 23346, 742, 525, 303, 3470, 1024, 742, 525, 19303, 4926, 1178, 742, 525, 19303,
120
- 4926, 1024, 742, 525, 19303, 4926, 9008, 742, 525, 311, 8076, 3787, 9704, 1178, 742, 525, 311, 8076, 3787, 9704, 1024, 742, 525, 15539, 1178, 742, 525, 27825, 1158, 775, 742, 525, 15539, 775
121
- , 742, 525, 15539, 1024, 742, 525, 15539, 3211, 742, 525, 15539, 9008, 742, 525, 15539, 4876, 742, 525, 4704, 1178, 742, 525, 4704, 1134, 775, 742, 525, 4704, 1024, 742, 525, 4704, 8666, 742,
122
- 525, 19303, 4926, 1178, 742, 525, 7328, 1178, 742, 525, 1256, 515, 742, 525, 1256, 304, 742, 525, 15539, 1178, 742, 525, 7328, 1178, 742, 525, 1256, 515, 742, 525, 1256, 304, 742, 525, 2098,
123
- 1178, 742, 525, 15539, 1178, 742, 525, 2098, 4660, 775, 742, 525, 2098, 2635, 742, 525, 311, 8076, 3787, 1178, 742, 525, 311, 8076, 3787, 9704, 1178, 742, 525, 8899, 1024, 742, 525, 8899, 32
124
- 11, 742, 525, 8899, 9008, 742, 525, 8899, 4876, 742, 525, 311, 8076, 1178, 742, 525, 311, 8076, 3787], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
125
- 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
126
- 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
127
- , 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
128
- 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [0, 13866, 338, 385, 15278, 393, 16612, 263, 3414, 298
129
- 92, 3300, 2859, 411, 385, 1881, 393, 8128, 4340, 3030, 29889, 14350, 263, 2933, 393, 7128, 2486, 1614, 2167, 278, 2009, 29889, 13, 13, 2277, 29937, 2799, 4080, 29901, 13, 11609, 278, 18999, 3
130
- 10, 599, 9316, 393, 892, 10372, 901, 1135, 2211, 3064, 470, 19056, 901, 1135, 29871, 29947, 29900, 29900, 29900, 29900, 29889, 13, 13, 2277, 29937, 10567, 29901, 13, 1839, 7328, 1178, 742, 52
131
- 5, 7328, 4902, 742, 525, 303, 3470, 1178, 742, 525, 303, 3470, 23346, 742, 525, 303, 3470, 1024, 742, 525, 19303, 4926, 1178, 742, 525, 19303, 4926, 1024, 742, 525, 19303, 4926, 9008, 742, 52
132
- 5, 311, 8076, 3787, 9704, 1178, 742, 525, 311, 8076, 3787, 9704, 1024, 742, 525, 15539, 1178, 742, 525, 27825, 1158, 775, 742, 525, 15539, 775, 742, 525, 15539, 1024, 742, 525, 15539, 3211, 7
133
- 42, 525, 15539, 9008, 742, 525, 15539, 4876, 742, 525, 4704, 1178, 742, 525, 4704, 1134, 775, 742, 525, 4704, 1024, 742, 525, 4704, 8666, 742, 525, 19303, 4926, 1178, 742, 525, 7328, 1178, 74
134
- 2, 525, 1256, 515, 742, 525, 1256, 304, 742, 525, 15539, 1178, 742, 525, 7328, 1178, 742, 525, 1256, 515, 742, 525, 1256, 304, 742, 525, 2098, 1178, 742, 525, 15539, 1178, 742, 525, 2098, 466
135
- 0, 775, 742, 525, 2098, 2635, 742, 525, 311, 8076, 3787, 1178, 742, 525, 311, 8076, 3787, 9704, 1178, 742, 525, 8899, 1024, 742, 525, 8899, 3211, 742, 525, 8899, 9008, 742, 525, 8899, 4876, 7
136
- 42, 525, 311, 8076, 1178, 742, 525, 311, 8076, 3787]}
137
- {'loss': 2.2228, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.06}
138
- {'loss': 2.185, 'learning_rate': 1.8e-05, 'epoch': 0.13}
139
- {'loss': 2.1452, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.19}
140
- {'loss': 2.0232, 'learning_rate': 3.8e-05, 'epoch': 0.25}
141
- {'loss': 1.884, 'learning_rate': 4.8e-05, 'epoch': 0.32}
142
- {'loss': 1.62, 'learning_rate': 5.6000000000000006e-05, 'epoch': 0.38}
143
- {'loss': 1.3664, 'learning_rate': 6.6e-05, 'epoch': 0.45}
144
- {'loss': 1.2159, 'learning_rate': 7.6e-05, 'epoch': 0.51}
145
- {'loss': 1.1656, 'learning_rate': 8.6e-05, 'epoch': 0.57}
146
- {'loss': 1.0664, 'learning_rate': 9.6e-05, 'epoch': 0.64}
147
- {'loss': 1.0253, 'learning_rate': 9.838274932614556e-05, 'epoch': 0.7}
148
- {'loss': 0.9716, 'learning_rate': 9.568733153638815e-05, 'epoch': 0.76}
149
- {'loss': 0.9162, 'learning_rate': 9.299191374663073e-05, 'epoch': 0.83}
150
- {'loss': 0.8849, 'learning_rate': 9.029649595687331e-05, 'epoch': 0.89}
151
- {'loss': 0.8648, 'learning_rate': 8.76010781671159e-05, 'epoch': 0.96}
152
- {'loss': 0.8077, 'learning_rate': 8.49056603773585e-05, 'epoch': 1.02}
153
- {'loss': 0.7443, 'learning_rate': 8.221024258760108e-05, 'epoch': 1.08}
154
- {'loss': 0.7253, 'learning_rate': 7.951482479784367e-05, 'epoch': 1.15}
155
- {'loss': 0.6845, 'learning_rate': 7.681940700808625e-05, 'epoch': 1.21}
156
- {'loss': 0.6956, 'learning_rate': 7.412398921832885e-05, 'epoch': 1.27}
157
- {'eval_loss': 0.6555210947990417, 'eval_runtime': 179.8763, 'eval_samples_per_second': 11.119, 'eval_steps_per_second': 0.695, 'epoch': 1.27}
158
- {'loss': 0.6293, 'learning_rate': 7.142857142857143e-05, 'epoch': 1.34}
159
- {'loss': 0.5948, 'learning_rate': 6.873315363881401e-05, 'epoch': 1.4}
160
- {'loss': 0.5306, 'learning_rate': 6.60377358490566e-05, 'epoch': 1.46}
161
- {'loss': 0.5607, 'learning_rate': 6.33423180592992e-05, 'epoch': 1.53}
162
- {'loss': 0.5095, 'learning_rate': 6.0646900269541785e-05, 'epoch': 1.59}
163
- {'loss': 0.4947, 'learning_rate': 5.795148247978437e-05, 'epoch': 1.66}
164
- {'loss': 0.4856, 'learning_rate': 5.525606469002696e-05, 'epoch': 1.72}
165
- {'loss': 0.4878, 'learning_rate': 5.2560646900269536e-05, 'epoch': 1.78}
166
- {'loss': 0.4496, 'learning_rate': 4.986522911051213e-05, 'epoch': 1.85}
167
- {'loss': 0.4544, 'learning_rate': 4.716981132075472e-05, 'epoch': 1.91}
168
- {'loss': 0.4542, 'learning_rate': 4.447439353099731e-05, 'epoch': 1.97}
169
- {'loss': 0.4556, 'learning_rate': 4.1778975741239893e-05, 'epoch': 2.04}
170
- {'loss': 0.4014, 'learning_rate': 3.908355795148248e-05, 'epoch': 2.1}
171
- {'loss': 0.3893, 'learning_rate': 3.638814016172507e-05, 'epoch': 2.17}
172
- {'loss': 0.4197, 'learning_rate': 3.369272237196766e-05, 'epoch': 2.23}
173
- {'loss': 0.3942, 'learning_rate': 3.0997304582210244e-05, 'epoch': 2.29}
174
- {'loss': 0.3967, 'learning_rate': 2.830188679245283e-05, 'epoch': 2.36}
175
- {'loss': 0.3848, 'learning_rate': 2.5606469002695423e-05, 'epoch': 2.42}
176
- {'loss': 0.3834, 'learning_rate': 2.2911051212938006e-05, 'epoch': 2.48}
177
- {'loss': 0.3647, 'learning_rate': 2.0215633423180595e-05, 'epoch': 2.55}
178
- {'eval_loss': 0.3913075923919678, 'eval_runtime': 179.5793, 'eval_samples_per_second': 11.137, 'eval_steps_per_second': 0.696, 'epoch': 2.55}
179
- {'loss': 0.3703, 'learning_rate': 1.752021563342318e-05, 'epoch': 2.61}
180
- {'loss': 0.3776, 'learning_rate': 1.4824797843665769e-05, 'epoch': 2.68}
181
- {'loss': 0.3509, 'learning_rate': 1.2129380053908356e-05, 'epoch': 2.74}
182
- {'loss': 0.3622, 'learning_rate': 9.433962264150944e-06, 'epoch': 2.8}
183
- {'loss': 0.351, 'learning_rate': 6.738544474393531e-06, 'epoch': 2.87}
184
- {'loss': 0.351, 'learning_rate': 6.738544474393531e-06, 'epoch': 2.87}
185
- {'loss': 0.3497, 'learning_rate': 4.0431266846361185e-06, 'epoch': 2.93}
186
- {'loss': 0.369, 'learning_rate': 1.3477088948787064e-06, 'epoch': 2.99}
187
- 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 471/471 [1:25:47<00:00, 8.04s/it]
188
- {'train_runtime': 5148.4044, 'train_samples_per_second': 2.914, 'train_steps_per_second': 0.091, 'train_loss': 0.7860396517057074, 'epoch': 3.0}
189
- 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 471/471 [1:25:47<00:00, 10.93s/it]
190
 
 
 
191
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
 
8
  ## Log
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
+ (sqltest) chrisdono@deep-learning-duo-t4-3:~/alpaca-lora$ WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path 'spider' --output_dir './lora-alpaca' --num_epochs 10 --batch_size 32 --micro_batch_size 16 --learning_rate '9e-5' --add_eos_token
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
+
13
+ Adding last loss values not included in trainer json file from last checkpoint.
14
 
15
+ {'loss': 0.241, 'learning_rate': 1.0040816326530613e-05, 'epoch': 8.98}
16
+ {'loss': 0.2343, 'learning_rate': 9.42857142857143e-06, 'epoch': 9.04}
17
+ {'loss': 0.2376, 'learning_rate': 8.816326530612245e-06, 'epoch': 9.11}
18
+ {'loss': 0.2355, 'learning_rate': 8.204081632653062e-06, 'epoch': 9.17}
19
+ {'loss': 0.229, 'learning_rate': 7.591836734693877e-06, 'epoch': 9.24}
20
+ {'loss': 0.2325, 'learning_rate': 6.979591836734694e-06, 'epoch': 9.3}
21
+ {'loss': 0.24, 'learning_rate': 6.367346938775511e-06, 'epoch': 9.36}
22
+ {'loss': 0.2438, 'learning_rate': 5.755102040816327e-06, 'epoch': 9.43}
23
+ {'loss': 0.2391, 'learning_rate': 5.142857142857143e-06, 'epoch': 9.49}
24
+ {'loss': 0.2351, 'learning_rate': 4.530612244897959e-06, 'epoch': 9.55}
25
+ {'loss': 0.2289, 'learning_rate': 3.9183673469387755e-06, 'epoch': 9.62}
26
+ {'loss': 0.2294, 'learning_rate': 3.3061224489795924e-06, 'epoch': 9.68}
27
+ {'loss': 0.2344, 'learning_rate': 2.693877551020408e-06, 'epoch': 9.75}
28
+ {'loss': 0.2358, 'learning_rate': 2.0816326530612247e-06, 'epoch': 9.81}
29
+ {'loss': 0.2365, 'learning_rate': 1.469387755102041e-06, 'epoch': 9.87}
30
+ {'loss': 0.2309, 'learning_rate': 8.571428571428572e-07, 'epoch': 9.94}
31
+ {'loss': 0.2438, 'learning_rate': 2.4489795918367347e-07, 'epoch': 10.0}
32
+ 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1570
33
+ {'train_runtime': 17144.6766, 'train_samples_per_second': 2.916, 'train_steps_per_second': 0.092, 'train_loss': 0.41175747267000234, 'epoch': 10.0}
34
+ 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1570
35
+ 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1570
36
+ /1570 [4:45:44<00:00, 10.92s/it]
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:37fb54e8d75f814d5ffae04ff5f8644acdcdad19275a21db211de2ecb76670dd
3
  size 16822989
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1e81348636200bed953a7d22b8739df02326f7214568cb23543c97d607744996
3
  size 16822989
{checkpoint-200 β†’ checkpoint-1000}/optimizer.pt RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:68d211e82d8f1900537c205ef1633ac3840314f169f4adc5809db0bb719c102a
3
  size 33661637
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ada108aeb75370253e0951853ba7679a2e83ece2a0fb62fa917cfc8b801a6e05
3
  size 33661637
{checkpoint-400 β†’ checkpoint-1000}/pytorch_model.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:355dd19124112f4ec4f4a9e72231b69569dc35413a301bd92e0ee46c98cc1662
3
  size 16822989
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a446829eac1cfe55026072d9943915b7e04e92ec658726682e785ee4cdd2913
3
  size 16822989
{checkpoint-400 β†’ checkpoint-1000}/rng_state_0.pth RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:be48128cb3aa02832be14199e42efe7eb640a1beb7d0d3b97912e5eb5e87bb89
3
  size 14583
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c577fb5890f86350227d881b094a3f4caad4d67e203b55f7ff03a77b90ab0cef
3
  size 14583
{checkpoint-400 β†’ checkpoint-1000}/rng_state_1.pth RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:64252d7c9f810680b0e74199d8ca1e448242a8f23f43a16f542dd555b08c5cfb
3
  size 14583
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4714127c9b51c424cf4b38037eeee13e70666b411e45a04809165e1ffbd0d4b0
3
  size 14583
{checkpoint-200 β†’ checkpoint-1000}/scaler.pt RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:426980f072dd7d4413cef6ecb824723d008ee9b3a98c121abdc77d7a7149e621
3
  size 557
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52e4fb9b07eb9216477d8718b3245f228285c2168e19863ec6206be844186973
3
  size 557
{checkpoint-200 β†’ checkpoint-1000}/scheduler.pt RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dedd1e40b864a676379ea22469f5f6b7a5fc35c2603f57148a91444b58be0f81
3
  size 627
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67f4442b507c07c88aff26c2e96c490aeb9a0158825e369330c7aae60a905187
3
  size 627
checkpoint-1000/trainer_state.json ADDED
@@ -0,0 +1,656 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.2628234326839447,
3
+ "best_model_checkpoint": "./lora-alpaca/checkpoint-1000",
4
+ "epoch": 6.369426751592357,
5
+ "global_step": 1000,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.06,
12
+ "learning_rate": 7.2000000000000005e-06,
13
+ "loss": 2.2569,
14
+ "step": 10
15
+ },
16
+ {
17
+ "epoch": 0.13,
18
+ "learning_rate": 1.62e-05,
19
+ "loss": 2.1837,
20
+ "step": 20
21
+ },
22
+ {
23
+ "epoch": 0.19,
24
+ "learning_rate": 2.5200000000000003e-05,
25
+ "loss": 2.0973,
26
+ "step": 30
27
+ },
28
+ {
29
+ "epoch": 0.25,
30
+ "learning_rate": 3.4200000000000005e-05,
31
+ "loss": 2.0635,
32
+ "step": 40
33
+ },
34
+ {
35
+ "epoch": 0.32,
36
+ "learning_rate": 4.32e-05,
37
+ "loss": 1.9233,
38
+ "step": 50
39
+ },
40
+ {
41
+ "epoch": 0.38,
42
+ "learning_rate": 5.13e-05,
43
+ "loss": 1.6655,
44
+ "step": 60
45
+ },
46
+ {
47
+ "epoch": 0.45,
48
+ "learning_rate": 5.940000000000001e-05,
49
+ "loss": 1.4289,
50
+ "step": 70
51
+ },
52
+ {
53
+ "epoch": 0.51,
54
+ "learning_rate": 6.840000000000001e-05,
55
+ "loss": 1.2303,
56
+ "step": 80
57
+ },
58
+ {
59
+ "epoch": 0.57,
60
+ "learning_rate": 7.74e-05,
61
+ "loss": 1.1787,
62
+ "step": 90
63
+ },
64
+ {
65
+ "epoch": 0.64,
66
+ "learning_rate": 8.64e-05,
67
+ "loss": 1.0882,
68
+ "step": 100
69
+ },
70
+ {
71
+ "epoch": 0.7,
72
+ "learning_rate": 8.96326530612245e-05,
73
+ "loss": 1.0611,
74
+ "step": 110
75
+ },
76
+ {
77
+ "epoch": 0.76,
78
+ "learning_rate": 8.902040816326531e-05,
79
+ "loss": 0.9818,
80
+ "step": 120
81
+ },
82
+ {
83
+ "epoch": 0.83,
84
+ "learning_rate": 8.840816326530614e-05,
85
+ "loss": 0.9416,
86
+ "step": 130
87
+ },
88
+ {
89
+ "epoch": 0.89,
90
+ "learning_rate": 8.779591836734695e-05,
91
+ "loss": 0.9049,
92
+ "step": 140
93
+ },
94
+ {
95
+ "epoch": 0.96,
96
+ "learning_rate": 8.718367346938777e-05,
97
+ "loss": 0.8774,
98
+ "step": 150
99
+ },
100
+ {
101
+ "epoch": 1.02,
102
+ "learning_rate": 8.657142857142858e-05,
103
+ "loss": 0.8507,
104
+ "step": 160
105
+ },
106
+ {
107
+ "epoch": 1.08,
108
+ "learning_rate": 8.59591836734694e-05,
109
+ "loss": 0.7797,
110
+ "step": 170
111
+ },
112
+ {
113
+ "epoch": 1.15,
114
+ "learning_rate": 8.534693877551022e-05,
115
+ "loss": 0.7334,
116
+ "step": 180
117
+ },
118
+ {
119
+ "epoch": 1.21,
120
+ "learning_rate": 8.473469387755103e-05,
121
+ "loss": 0.712,
122
+ "step": 190
123
+ },
124
+ {
125
+ "epoch": 1.27,
126
+ "learning_rate": 8.412244897959185e-05,
127
+ "loss": 0.6587,
128
+ "step": 200
129
+ },
130
+ {
131
+ "epoch": 1.27,
132
+ "eval_loss": 0.6659865379333496,
133
+ "eval_runtime": 179.3178,
134
+ "eval_samples_per_second": 11.153,
135
+ "eval_steps_per_second": 0.697,
136
+ "step": 200
137
+ },
138
+ {
139
+ "epoch": 1.34,
140
+ "learning_rate": 8.351020408163266e-05,
141
+ "loss": 0.6116,
142
+ "step": 210
143
+ },
144
+ {
145
+ "epoch": 1.4,
146
+ "learning_rate": 8.289795918367348e-05,
147
+ "loss": 0.6065,
148
+ "step": 220
149
+ },
150
+ {
151
+ "epoch": 1.46,
152
+ "learning_rate": 8.228571428571429e-05,
153
+ "loss": 0.6109,
154
+ "step": 230
155
+ },
156
+ {
157
+ "epoch": 1.53,
158
+ "learning_rate": 8.167346938775511e-05,
159
+ "loss": 0.5603,
160
+ "step": 240
161
+ },
162
+ {
163
+ "epoch": 1.59,
164
+ "learning_rate": 8.106122448979592e-05,
165
+ "loss": 0.5223,
166
+ "step": 250
167
+ },
168
+ {
169
+ "epoch": 1.66,
170
+ "learning_rate": 8.044897959183674e-05,
171
+ "loss": 0.5039,
172
+ "step": 260
173
+ },
174
+ {
175
+ "epoch": 1.72,
176
+ "learning_rate": 7.983673469387757e-05,
177
+ "loss": 0.4482,
178
+ "step": 270
179
+ },
180
+ {
181
+ "epoch": 1.78,
182
+ "learning_rate": 7.922448979591838e-05,
183
+ "loss": 0.4494,
184
+ "step": 280
185
+ },
186
+ {
187
+ "epoch": 1.85,
188
+ "learning_rate": 7.86122448979592e-05,
189
+ "loss": 0.4578,
190
+ "step": 290
191
+ },
192
+ {
193
+ "epoch": 1.91,
194
+ "learning_rate": 7.800000000000001e-05,
195
+ "loss": 0.4346,
196
+ "step": 300
197
+ },
198
+ {
199
+ "epoch": 1.97,
200
+ "learning_rate": 7.738775510204083e-05,
201
+ "loss": 0.4274,
202
+ "step": 310
203
+ },
204
+ {
205
+ "epoch": 2.04,
206
+ "learning_rate": 7.677551020408164e-05,
207
+ "loss": 0.3999,
208
+ "step": 320
209
+ },
210
+ {
211
+ "epoch": 2.1,
212
+ "learning_rate": 7.616326530612246e-05,
213
+ "loss": 0.3825,
214
+ "step": 330
215
+ },
216
+ {
217
+ "epoch": 2.17,
218
+ "learning_rate": 7.555102040816327e-05,
219
+ "loss": 0.382,
220
+ "step": 340
221
+ },
222
+ {
223
+ "epoch": 2.23,
224
+ "learning_rate": 7.493877551020409e-05,
225
+ "loss": 0.3652,
226
+ "step": 350
227
+ },
228
+ {
229
+ "epoch": 2.29,
230
+ "learning_rate": 7.43265306122449e-05,
231
+ "loss": 0.3732,
232
+ "step": 360
233
+ },
234
+ {
235
+ "epoch": 2.36,
236
+ "learning_rate": 7.371428571428572e-05,
237
+ "loss": 0.3422,
238
+ "step": 370
239
+ },
240
+ {
241
+ "epoch": 2.42,
242
+ "learning_rate": 7.310204081632653e-05,
243
+ "loss": 0.3661,
244
+ "step": 380
245
+ },
246
+ {
247
+ "epoch": 2.48,
248
+ "learning_rate": 7.248979591836735e-05,
249
+ "loss": 0.3461,
250
+ "step": 390
251
+ },
252
+ {
253
+ "epoch": 2.55,
254
+ "learning_rate": 7.187755102040816e-05,
255
+ "loss": 0.3594,
256
+ "step": 400
257
+ },
258
+ {
259
+ "epoch": 2.55,
260
+ "eval_loss": 0.35238778591156006,
261
+ "eval_runtime": 179.1609,
262
+ "eval_samples_per_second": 11.163,
263
+ "eval_steps_per_second": 0.698,
264
+ "step": 400
265
+ },
266
+ {
267
+ "epoch": 2.61,
268
+ "learning_rate": 7.126530612244898e-05,
269
+ "loss": 0.3472,
270
+ "step": 410
271
+ },
272
+ {
273
+ "epoch": 2.68,
274
+ "learning_rate": 7.065306122448981e-05,
275
+ "loss": 0.3314,
276
+ "step": 420
277
+ },
278
+ {
279
+ "epoch": 2.74,
280
+ "learning_rate": 7.004081632653062e-05,
281
+ "loss": 0.3323,
282
+ "step": 430
283
+ },
284
+ {
285
+ "epoch": 2.8,
286
+ "learning_rate": 6.942857142857144e-05,
287
+ "loss": 0.3141,
288
+ "step": 440
289
+ },
290
+ {
291
+ "epoch": 2.87,
292
+ "learning_rate": 6.881632653061225e-05,
293
+ "loss": 0.3242,
294
+ "step": 450
295
+ },
296
+ {
297
+ "epoch": 2.93,
298
+ "learning_rate": 6.820408163265307e-05,
299
+ "loss": 0.329,
300
+ "step": 460
301
+ },
302
+ {
303
+ "epoch": 2.99,
304
+ "learning_rate": 6.759183673469389e-05,
305
+ "loss": 0.3146,
306
+ "step": 470
307
+ },
308
+ {
309
+ "epoch": 3.06,
310
+ "learning_rate": 6.69795918367347e-05,
311
+ "loss": 0.289,
312
+ "step": 480
313
+ },
314
+ {
315
+ "epoch": 3.12,
316
+ "learning_rate": 6.636734693877552e-05,
317
+ "loss": 0.2973,
318
+ "step": 490
319
+ },
320
+ {
321
+ "epoch": 3.18,
322
+ "learning_rate": 6.575510204081633e-05,
323
+ "loss": 0.313,
324
+ "step": 500
325
+ },
326
+ {
327
+ "epoch": 3.25,
328
+ "learning_rate": 6.514285714285715e-05,
329
+ "loss": 0.2974,
330
+ "step": 510
331
+ },
332
+ {
333
+ "epoch": 3.31,
334
+ "learning_rate": 6.453061224489796e-05,
335
+ "loss": 0.2822,
336
+ "step": 520
337
+ },
338
+ {
339
+ "epoch": 3.38,
340
+ "learning_rate": 6.391836734693878e-05,
341
+ "loss": 0.2925,
342
+ "step": 530
343
+ },
344
+ {
345
+ "epoch": 3.44,
346
+ "learning_rate": 6.330612244897959e-05,
347
+ "loss": 0.2951,
348
+ "step": 540
349
+ },
350
+ {
351
+ "epoch": 3.5,
352
+ "learning_rate": 6.269387755102041e-05,
353
+ "loss": 0.2875,
354
+ "step": 550
355
+ },
356
+ {
357
+ "epoch": 3.57,
358
+ "learning_rate": 6.208163265306124e-05,
359
+ "loss": 0.2889,
360
+ "step": 560
361
+ },
362
+ {
363
+ "epoch": 3.63,
364
+ "learning_rate": 6.146938775510205e-05,
365
+ "loss": 0.2871,
366
+ "step": 570
367
+ },
368
+ {
369
+ "epoch": 3.69,
370
+ "learning_rate": 6.085714285714287e-05,
371
+ "loss": 0.2718,
372
+ "step": 580
373
+ },
374
+ {
375
+ "epoch": 3.76,
376
+ "learning_rate": 6.024489795918368e-05,
377
+ "loss": 0.285,
378
+ "step": 590
379
+ },
380
+ {
381
+ "epoch": 3.82,
382
+ "learning_rate": 5.96326530612245e-05,
383
+ "loss": 0.2705,
384
+ "step": 600
385
+ },
386
+ {
387
+ "epoch": 3.82,
388
+ "eval_loss": 0.2906799912452698,
389
+ "eval_runtime": 178.9594,
390
+ "eval_samples_per_second": 11.176,
391
+ "eval_steps_per_second": 0.698,
392
+ "step": 600
393
+ },
394
+ {
395
+ "epoch": 3.89,
396
+ "learning_rate": 5.902040816326531e-05,
397
+ "loss": 0.2794,
398
+ "step": 610
399
+ },
400
+ {
401
+ "epoch": 3.95,
402
+ "learning_rate": 5.840816326530613e-05,
403
+ "loss": 0.2784,
404
+ "step": 620
405
+ },
406
+ {
407
+ "epoch": 4.01,
408
+ "learning_rate": 5.779591836734694e-05,
409
+ "loss": 0.2862,
410
+ "step": 630
411
+ },
412
+ {
413
+ "epoch": 4.08,
414
+ "learning_rate": 5.718367346938776e-05,
415
+ "loss": 0.2695,
416
+ "step": 640
417
+ },
418
+ {
419
+ "epoch": 4.14,
420
+ "learning_rate": 5.6571428571428574e-05,
421
+ "loss": 0.2732,
422
+ "step": 650
423
+ },
424
+ {
425
+ "epoch": 4.2,
426
+ "learning_rate": 5.595918367346939e-05,
427
+ "loss": 0.265,
428
+ "step": 660
429
+ },
430
+ {
431
+ "epoch": 4.27,
432
+ "learning_rate": 5.5346938775510204e-05,
433
+ "loss": 0.2703,
434
+ "step": 670
435
+ },
436
+ {
437
+ "epoch": 4.33,
438
+ "learning_rate": 5.473469387755102e-05,
439
+ "loss": 0.2683,
440
+ "step": 680
441
+ },
442
+ {
443
+ "epoch": 4.39,
444
+ "learning_rate": 5.4122448979591835e-05,
445
+ "loss": 0.2666,
446
+ "step": 690
447
+ },
448
+ {
449
+ "epoch": 4.46,
450
+ "learning_rate": 5.351020408163265e-05,
451
+ "loss": 0.2571,
452
+ "step": 700
453
+ },
454
+ {
455
+ "epoch": 4.52,
456
+ "learning_rate": 5.289795918367348e-05,
457
+ "loss": 0.2723,
458
+ "step": 710
459
+ },
460
+ {
461
+ "epoch": 4.59,
462
+ "learning_rate": 5.2285714285714294e-05,
463
+ "loss": 0.2739,
464
+ "step": 720
465
+ },
466
+ {
467
+ "epoch": 4.65,
468
+ "learning_rate": 5.167346938775511e-05,
469
+ "loss": 0.2592,
470
+ "step": 730
471
+ },
472
+ {
473
+ "epoch": 4.71,
474
+ "learning_rate": 5.1061224489795925e-05,
475
+ "loss": 0.261,
476
+ "step": 740
477
+ },
478
+ {
479
+ "epoch": 4.78,
480
+ "learning_rate": 5.044897959183674e-05,
481
+ "loss": 0.2613,
482
+ "step": 750
483
+ },
484
+ {
485
+ "epoch": 4.84,
486
+ "learning_rate": 4.9836734693877555e-05,
487
+ "loss": 0.2641,
488
+ "step": 760
489
+ },
490
+ {
491
+ "epoch": 4.9,
492
+ "learning_rate": 4.922448979591837e-05,
493
+ "loss": 0.2648,
494
+ "step": 770
495
+ },
496
+ {
497
+ "epoch": 4.97,
498
+ "learning_rate": 4.8612244897959186e-05,
499
+ "loss": 0.2632,
500
+ "step": 780
501
+ },
502
+ {
503
+ "epoch": 5.03,
504
+ "learning_rate": 4.8e-05,
505
+ "loss": 0.2692,
506
+ "step": 790
507
+ },
508
+ {
509
+ "epoch": 5.1,
510
+ "learning_rate": 4.7387755102040816e-05,
511
+ "loss": 0.2526,
512
+ "step": 800
513
+ },
514
+ {
515
+ "epoch": 5.1,
516
+ "eval_loss": 0.273679256439209,
517
+ "eval_runtime": 179.1874,
518
+ "eval_samples_per_second": 11.162,
519
+ "eval_steps_per_second": 0.698,
520
+ "step": 800
521
+ },
522
+ {
523
+ "epoch": 5.16,
524
+ "learning_rate": 4.677551020408163e-05,
525
+ "loss": 0.261,
526
+ "step": 810
527
+ },
528
+ {
529
+ "epoch": 5.22,
530
+ "learning_rate": 4.616326530612245e-05,
531
+ "loss": 0.2459,
532
+ "step": 820
533
+ },
534
+ {
535
+ "epoch": 5.29,
536
+ "learning_rate": 4.555102040816326e-05,
537
+ "loss": 0.2595,
538
+ "step": 830
539
+ },
540
+ {
541
+ "epoch": 5.35,
542
+ "learning_rate": 4.4938775510204084e-05,
543
+ "loss": 0.2545,
544
+ "step": 840
545
+ },
546
+ {
547
+ "epoch": 5.41,
548
+ "learning_rate": 4.43265306122449e-05,
549
+ "loss": 0.2628,
550
+ "step": 850
551
+ },
552
+ {
553
+ "epoch": 5.48,
554
+ "learning_rate": 4.3714285714285715e-05,
555
+ "loss": 0.2606,
556
+ "step": 860
557
+ },
558
+ {
559
+ "epoch": 5.54,
560
+ "learning_rate": 4.310204081632653e-05,
561
+ "loss": 0.2512,
562
+ "step": 870
563
+ },
564
+ {
565
+ "epoch": 5.61,
566
+ "learning_rate": 4.2489795918367345e-05,
567
+ "loss": 0.2566,
568
+ "step": 880
569
+ },
570
+ {
571
+ "epoch": 5.67,
572
+ "learning_rate": 4.187755102040816e-05,
573
+ "loss": 0.2572,
574
+ "step": 890
575
+ },
576
+ {
577
+ "epoch": 5.73,
578
+ "learning_rate": 4.126530612244898e-05,
579
+ "loss": 0.247,
580
+ "step": 900
581
+ },
582
+ {
583
+ "epoch": 5.8,
584
+ "learning_rate": 4.06530612244898e-05,
585
+ "loss": 0.255,
586
+ "step": 910
587
+ },
588
+ {
589
+ "epoch": 5.86,
590
+ "learning_rate": 4.004081632653061e-05,
591
+ "loss": 0.263,
592
+ "step": 920
593
+ },
594
+ {
595
+ "epoch": 5.92,
596
+ "learning_rate": 3.942857142857143e-05,
597
+ "loss": 0.2622,
598
+ "step": 930
599
+ },
600
+ {
601
+ "epoch": 5.99,
602
+ "learning_rate": 3.8816326530612244e-05,
603
+ "loss": 0.2496,
604
+ "step": 940
605
+ },
606
+ {
607
+ "epoch": 6.05,
608
+ "learning_rate": 3.820408163265306e-05,
609
+ "loss": 0.258,
610
+ "step": 950
611
+ },
612
+ {
613
+ "epoch": 6.11,
614
+ "learning_rate": 3.7591836734693874e-05,
615
+ "loss": 0.2425,
616
+ "step": 960
617
+ },
618
+ {
619
+ "epoch": 6.18,
620
+ "learning_rate": 3.6979591836734696e-05,
621
+ "loss": 0.2444,
622
+ "step": 970
623
+ },
624
+ {
625
+ "epoch": 6.24,
626
+ "learning_rate": 3.636734693877551e-05,
627
+ "loss": 0.2476,
628
+ "step": 980
629
+ },
630
+ {
631
+ "epoch": 6.31,
632
+ "learning_rate": 3.575510204081633e-05,
633
+ "loss": 0.2429,
634
+ "step": 990
635
+ },
636
+ {
637
+ "epoch": 6.37,
638
+ "learning_rate": 3.514285714285714e-05,
639
+ "loss": 0.2512,
640
+ "step": 1000
641
+ },
642
+ {
643
+ "epoch": 6.37,
644
+ "eval_loss": 0.2628234326839447,
645
+ "eval_runtime": 178.8184,
646
+ "eval_samples_per_second": 11.185,
647
+ "eval_steps_per_second": 0.699,
648
+ "step": 1000
649
+ }
650
+ ],
651
+ "max_steps": 1570,
652
+ "num_train_epochs": 10,
653
+ "total_flos": 3.234749529446154e+17,
654
+ "trial_name": null,
655
+ "trial_params": null
656
+ }
{checkpoint-400 β†’ checkpoint-1000}/training_args.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7dd968ac3692484ae7ef48f2fe56fbc52fc522e7cedd7f98c741d309d92c7408
3
  size 3579
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6db1d76c8c58296c6ccd7f8675679f24bb5bc1caeeabf527135e316aab994bff
3
  size 3579
{checkpoint-400 β†’ checkpoint-1200}/optimizer.pt RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:91fdba9bd1d20770960aca4e84fef26f9b5ec078297805aad30b770488200e18
3
  size 33661637
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87fa5d20358c5c3180f5c58aee72fe1a6695eb24b03f5c28b90d121c651b08c8
3
  size 33661637
{checkpoint-200 β†’ checkpoint-1200}/pytorch_model.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0e98feaf29bcad5a4514ce4c0e7c421f687c009a6f047d60103c350dc3efacc7
3
  size 16822989
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cea7c1a7489ecc6765f9e09d47c1d7a0d638820a4225f61509e1de25fa1556ba
3
  size 16822989
{checkpoint-200 β†’ checkpoint-1200}/rng_state_0.pth RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:22272e3838963bad9b0a3193dce5b231ec52af46288ed966f9c19fee3d2ae2ad
3
  size 14583
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f40f510d83877d97169e3e7017e6351d58d653fc20f9328aa70ee78cf702d017
3
  size 14583
{checkpoint-200 β†’ checkpoint-1200}/rng_state_1.pth RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a84efcd3c2c174f123d98c04e7f9972f182f515c9aeafef4e77248e60e90d277
3
  size 14583
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eda8c300e110c272548bc6247cb827107d9abdf72de04ad474e06742c3dcf7d5
3
  size 14583
{checkpoint-400 β†’ checkpoint-1200}/scaler.pt RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:800eaaad24101923b7afda0cde67e8241ab9447a6c388a7805b55fd6229bcf13
3
  size 557
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39d51ec17e1ddffa554b673fbc942280df63b8160e8267e949a87258ac2e47d9
3
  size 557
{checkpoint-400 β†’ checkpoint-1200}/scheduler.pt RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6b5433e62027d6b49a231a798638a7c2420ff2f540a1905be487feea51a17236
3
  size 627
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e8b99cd5679e5a59eba6034ebb2efd6f1fbca5d7a9715bb0e51e65eb0b1fc9d
3
  size 627
checkpoint-1200/trainer_state.json ADDED
@@ -0,0 +1,784 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.25768929719924927,
3
+ "best_model_checkpoint": "./lora-alpaca/checkpoint-1200",
4
+ "epoch": 7.643312101910828,
5
+ "global_step": 1200,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.06,
12
+ "learning_rate": 7.2000000000000005e-06,
13
+ "loss": 2.2569,
14
+ "step": 10
15
+ },
16
+ {
17
+ "epoch": 0.13,
18
+ "learning_rate": 1.62e-05,
19
+ "loss": 2.1837,
20
+ "step": 20
21
+ },
22
+ {
23
+ "epoch": 0.19,
24
+ "learning_rate": 2.5200000000000003e-05,
25
+ "loss": 2.0973,
26
+ "step": 30
27
+ },
28
+ {
29
+ "epoch": 0.25,
30
+ "learning_rate": 3.4200000000000005e-05,
31
+ "loss": 2.0635,
32
+ "step": 40
33
+ },
34
+ {
35
+ "epoch": 0.32,
36
+ "learning_rate": 4.32e-05,
37
+ "loss": 1.9233,
38
+ "step": 50
39
+ },
40
+ {
41
+ "epoch": 0.38,
42
+ "learning_rate": 5.13e-05,
43
+ "loss": 1.6655,
44
+ "step": 60
45
+ },
46
+ {
47
+ "epoch": 0.45,
48
+ "learning_rate": 5.940000000000001e-05,
49
+ "loss": 1.4289,
50
+ "step": 70
51
+ },
52
+ {
53
+ "epoch": 0.51,
54
+ "learning_rate": 6.840000000000001e-05,
55
+ "loss": 1.2303,
56
+ "step": 80
57
+ },
58
+ {
59
+ "epoch": 0.57,
60
+ "learning_rate": 7.74e-05,
61
+ "loss": 1.1787,
62
+ "step": 90
63
+ },
64
+ {
65
+ "epoch": 0.64,
66
+ "learning_rate": 8.64e-05,
67
+ "loss": 1.0882,
68
+ "step": 100
69
+ },
70
+ {
71
+ "epoch": 0.7,
72
+ "learning_rate": 8.96326530612245e-05,
73
+ "loss": 1.0611,
74
+ "step": 110
75
+ },
76
+ {
77
+ "epoch": 0.76,
78
+ "learning_rate": 8.902040816326531e-05,
79
+ "loss": 0.9818,
80
+ "step": 120
81
+ },
82
+ {
83
+ "epoch": 0.83,
84
+ "learning_rate": 8.840816326530614e-05,
85
+ "loss": 0.9416,
86
+ "step": 130
87
+ },
88
+ {
89
+ "epoch": 0.89,
90
+ "learning_rate": 8.779591836734695e-05,
91
+ "loss": 0.9049,
92
+ "step": 140
93
+ },
94
+ {
95
+ "epoch": 0.96,
96
+ "learning_rate": 8.718367346938777e-05,
97
+ "loss": 0.8774,
98
+ "step": 150
99
+ },
100
+ {
101
+ "epoch": 1.02,
102
+ "learning_rate": 8.657142857142858e-05,
103
+ "loss": 0.8507,
104
+ "step": 160
105
+ },
106
+ {
107
+ "epoch": 1.08,
108
+ "learning_rate": 8.59591836734694e-05,
109
+ "loss": 0.7797,
110
+ "step": 170
111
+ },
112
+ {
113
+ "epoch": 1.15,
114
+ "learning_rate": 8.534693877551022e-05,
115
+ "loss": 0.7334,
116
+ "step": 180
117
+ },
118
+ {
119
+ "epoch": 1.21,
120
+ "learning_rate": 8.473469387755103e-05,
121
+ "loss": 0.712,
122
+ "step": 190
123
+ },
124
+ {
125
+ "epoch": 1.27,
126
+ "learning_rate": 8.412244897959185e-05,
127
+ "loss": 0.6587,
128
+ "step": 200
129
+ },
130
+ {
131
+ "epoch": 1.27,
132
+ "eval_loss": 0.6659865379333496,
133
+ "eval_runtime": 179.3178,
134
+ "eval_samples_per_second": 11.153,
135
+ "eval_steps_per_second": 0.697,
136
+ "step": 200
137
+ },
138
+ {
139
+ "epoch": 1.34,
140
+ "learning_rate": 8.351020408163266e-05,
141
+ "loss": 0.6116,
142
+ "step": 210
143
+ },
144
+ {
145
+ "epoch": 1.4,
146
+ "learning_rate": 8.289795918367348e-05,
147
+ "loss": 0.6065,
148
+ "step": 220
149
+ },
150
+ {
151
+ "epoch": 1.46,
152
+ "learning_rate": 8.228571428571429e-05,
153
+ "loss": 0.6109,
154
+ "step": 230
155
+ },
156
+ {
157
+ "epoch": 1.53,
158
+ "learning_rate": 8.167346938775511e-05,
159
+ "loss": 0.5603,
160
+ "step": 240
161
+ },
162
+ {
163
+ "epoch": 1.59,
164
+ "learning_rate": 8.106122448979592e-05,
165
+ "loss": 0.5223,
166
+ "step": 250
167
+ },
168
+ {
169
+ "epoch": 1.66,
170
+ "learning_rate": 8.044897959183674e-05,
171
+ "loss": 0.5039,
172
+ "step": 260
173
+ },
174
+ {
175
+ "epoch": 1.72,
176
+ "learning_rate": 7.983673469387757e-05,
177
+ "loss": 0.4482,
178
+ "step": 270
179
+ },
180
+ {
181
+ "epoch": 1.78,
182
+ "learning_rate": 7.922448979591838e-05,
183
+ "loss": 0.4494,
184
+ "step": 280
185
+ },
186
+ {
187
+ "epoch": 1.85,
188
+ "learning_rate": 7.86122448979592e-05,
189
+ "loss": 0.4578,
190
+ "step": 290
191
+ },
192
+ {
193
+ "epoch": 1.91,
194
+ "learning_rate": 7.800000000000001e-05,
195
+ "loss": 0.4346,
196
+ "step": 300
197
+ },
198
+ {
199
+ "epoch": 1.97,
200
+ "learning_rate": 7.738775510204083e-05,
201
+ "loss": 0.4274,
202
+ "step": 310
203
+ },
204
+ {
205
+ "epoch": 2.04,
206
+ "learning_rate": 7.677551020408164e-05,
207
+ "loss": 0.3999,
208
+ "step": 320
209
+ },
210
+ {
211
+ "epoch": 2.1,
212
+ "learning_rate": 7.616326530612246e-05,
213
+ "loss": 0.3825,
214
+ "step": 330
215
+ },
216
+ {
217
+ "epoch": 2.17,
218
+ "learning_rate": 7.555102040816327e-05,
219
+ "loss": 0.382,
220
+ "step": 340
221
+ },
222
+ {
223
+ "epoch": 2.23,
224
+ "learning_rate": 7.493877551020409e-05,
225
+ "loss": 0.3652,
226
+ "step": 350
227
+ },
228
+ {
229
+ "epoch": 2.29,
230
+ "learning_rate": 7.43265306122449e-05,
231
+ "loss": 0.3732,
232
+ "step": 360
233
+ },
234
+ {
235
+ "epoch": 2.36,
236
+ "learning_rate": 7.371428571428572e-05,
237
+ "loss": 0.3422,
238
+ "step": 370
239
+ },
240
+ {
241
+ "epoch": 2.42,
242
+ "learning_rate": 7.310204081632653e-05,
243
+ "loss": 0.3661,
244
+ "step": 380
245
+ },
246
+ {
247
+ "epoch": 2.48,
248
+ "learning_rate": 7.248979591836735e-05,
249
+ "loss": 0.3461,
250
+ "step": 390
251
+ },
252
+ {
253
+ "epoch": 2.55,
254
+ "learning_rate": 7.187755102040816e-05,
255
+ "loss": 0.3594,
256
+ "step": 400
257
+ },
258
+ {
259
+ "epoch": 2.55,
260
+ "eval_loss": 0.35238778591156006,
261
+ "eval_runtime": 179.1609,
262
+ "eval_samples_per_second": 11.163,
263
+ "eval_steps_per_second": 0.698,
264
+ "step": 400
265
+ },
266
+ {
267
+ "epoch": 2.61,
268
+ "learning_rate": 7.126530612244898e-05,
269
+ "loss": 0.3472,
270
+ "step": 410
271
+ },
272
+ {
273
+ "epoch": 2.68,
274
+ "learning_rate": 7.065306122448981e-05,
275
+ "loss": 0.3314,
276
+ "step": 420
277
+ },
278
+ {
279
+ "epoch": 2.74,
280
+ "learning_rate": 7.004081632653062e-05,
281
+ "loss": 0.3323,
282
+ "step": 430
283
+ },
284
+ {
285
+ "epoch": 2.8,
286
+ "learning_rate": 6.942857142857144e-05,
287
+ "loss": 0.3141,
288
+ "step": 440
289
+ },
290
+ {
291
+ "epoch": 2.87,
292
+ "learning_rate": 6.881632653061225e-05,
293
+ "loss": 0.3242,
294
+ "step": 450
295
+ },
296
+ {
297
+ "epoch": 2.93,
298
+ "learning_rate": 6.820408163265307e-05,
299
+ "loss": 0.329,
300
+ "step": 460
301
+ },
302
+ {
303
+ "epoch": 2.99,
304
+ "learning_rate": 6.759183673469389e-05,
305
+ "loss": 0.3146,
306
+ "step": 470
307
+ },
308
+ {
309
+ "epoch": 3.06,
310
+ "learning_rate": 6.69795918367347e-05,
311
+ "loss": 0.289,
312
+ "step": 480
313
+ },
314
+ {
315
+ "epoch": 3.12,
316
+ "learning_rate": 6.636734693877552e-05,
317
+ "loss": 0.2973,
318
+ "step": 490
319
+ },
320
+ {
321
+ "epoch": 3.18,
322
+ "learning_rate": 6.575510204081633e-05,
323
+ "loss": 0.313,
324
+ "step": 500
325
+ },
326
+ {
327
+ "epoch": 3.25,
328
+ "learning_rate": 6.514285714285715e-05,
329
+ "loss": 0.2974,
330
+ "step": 510
331
+ },
332
+ {
333
+ "epoch": 3.31,
334
+ "learning_rate": 6.453061224489796e-05,
335
+ "loss": 0.2822,
336
+ "step": 520
337
+ },
338
+ {
339
+ "epoch": 3.38,
340
+ "learning_rate": 6.391836734693878e-05,
341
+ "loss": 0.2925,
342
+ "step": 530
343
+ },
344
+ {
345
+ "epoch": 3.44,
346
+ "learning_rate": 6.330612244897959e-05,
347
+ "loss": 0.2951,
348
+ "step": 540
349
+ },
350
+ {
351
+ "epoch": 3.5,
352
+ "learning_rate": 6.269387755102041e-05,
353
+ "loss": 0.2875,
354
+ "step": 550
355
+ },
356
+ {
357
+ "epoch": 3.57,
358
+ "learning_rate": 6.208163265306124e-05,
359
+ "loss": 0.2889,
360
+ "step": 560
361
+ },
362
+ {
363
+ "epoch": 3.63,
364
+ "learning_rate": 6.146938775510205e-05,
365
+ "loss": 0.2871,
366
+ "step": 570
367
+ },
368
+ {
369
+ "epoch": 3.69,
370
+ "learning_rate": 6.085714285714287e-05,
371
+ "loss": 0.2718,
372
+ "step": 580
373
+ },
374
+ {
375
+ "epoch": 3.76,
376
+ "learning_rate": 6.024489795918368e-05,
377
+ "loss": 0.285,
378
+ "step": 590
379
+ },
380
+ {
381
+ "epoch": 3.82,
382
+ "learning_rate": 5.96326530612245e-05,
383
+ "loss": 0.2705,
384
+ "step": 600
385
+ },
386
+ {
387
+ "epoch": 3.82,
388
+ "eval_loss": 0.2906799912452698,
389
+ "eval_runtime": 178.9594,
390
+ "eval_samples_per_second": 11.176,
391
+ "eval_steps_per_second": 0.698,
392
+ "step": 600
393
+ },
394
+ {
395
+ "epoch": 3.89,
396
+ "learning_rate": 5.902040816326531e-05,
397
+ "loss": 0.2794,
398
+ "step": 610
399
+ },
400
+ {
401
+ "epoch": 3.95,
402
+ "learning_rate": 5.840816326530613e-05,
403
+ "loss": 0.2784,
404
+ "step": 620
405
+ },
406
+ {
407
+ "epoch": 4.01,
408
+ "learning_rate": 5.779591836734694e-05,
409
+ "loss": 0.2862,
410
+ "step": 630
411
+ },
412
+ {
413
+ "epoch": 4.08,
414
+ "learning_rate": 5.718367346938776e-05,
415
+ "loss": 0.2695,
416
+ "step": 640
417
+ },
418
+ {
419
+ "epoch": 4.14,
420
+ "learning_rate": 5.6571428571428574e-05,
421
+ "loss": 0.2732,
422
+ "step": 650
423
+ },
424
+ {
425
+ "epoch": 4.2,
426
+ "learning_rate": 5.595918367346939e-05,
427
+ "loss": 0.265,
428
+ "step": 660
429
+ },
430
+ {
431
+ "epoch": 4.27,
432
+ "learning_rate": 5.5346938775510204e-05,
433
+ "loss": 0.2703,
434
+ "step": 670
435
+ },
436
+ {
437
+ "epoch": 4.33,
438
+ "learning_rate": 5.473469387755102e-05,
439
+ "loss": 0.2683,
440
+ "step": 680
441
+ },
442
+ {
443
+ "epoch": 4.39,
444
+ "learning_rate": 5.4122448979591835e-05,
445
+ "loss": 0.2666,
446
+ "step": 690
447
+ },
448
+ {
449
+ "epoch": 4.46,
450
+ "learning_rate": 5.351020408163265e-05,
451
+ "loss": 0.2571,
452
+ "step": 700
453
+ },
454
+ {
455
+ "epoch": 4.52,
456
+ "learning_rate": 5.289795918367348e-05,
457
+ "loss": 0.2723,
458
+ "step": 710
459
+ },
460
+ {
461
+ "epoch": 4.59,
462
+ "learning_rate": 5.2285714285714294e-05,
463
+ "loss": 0.2739,
464
+ "step": 720
465
+ },
466
+ {
467
+ "epoch": 4.65,
468
+ "learning_rate": 5.167346938775511e-05,
469
+ "loss": 0.2592,
470
+ "step": 730
471
+ },
472
+ {
473
+ "epoch": 4.71,
474
+ "learning_rate": 5.1061224489795925e-05,
475
+ "loss": 0.261,
476
+ "step": 740
477
+ },
478
+ {
479
+ "epoch": 4.78,
480
+ "learning_rate": 5.044897959183674e-05,
481
+ "loss": 0.2613,
482
+ "step": 750
483
+ },
484
+ {
485
+ "epoch": 4.84,
486
+ "learning_rate": 4.9836734693877555e-05,
487
+ "loss": 0.2641,
488
+ "step": 760
489
+ },
490
+ {
491
+ "epoch": 4.9,
492
+ "learning_rate": 4.922448979591837e-05,
493
+ "loss": 0.2648,
494
+ "step": 770
495
+ },
496
+ {
497
+ "epoch": 4.97,
498
+ "learning_rate": 4.8612244897959186e-05,
499
+ "loss": 0.2632,
500
+ "step": 780
501
+ },
502
+ {
503
+ "epoch": 5.03,
504
+ "learning_rate": 4.8e-05,
505
+ "loss": 0.2692,
506
+ "step": 790
507
+ },
508
+ {
509
+ "epoch": 5.1,
510
+ "learning_rate": 4.7387755102040816e-05,
511
+ "loss": 0.2526,
512
+ "step": 800
513
+ },
514
+ {
515
+ "epoch": 5.1,
516
+ "eval_loss": 0.273679256439209,
517
+ "eval_runtime": 179.1874,
518
+ "eval_samples_per_second": 11.162,
519
+ "eval_steps_per_second": 0.698,
520
+ "step": 800
521
+ },
522
+ {
523
+ "epoch": 5.16,
524
+ "learning_rate": 4.677551020408163e-05,
525
+ "loss": 0.261,
526
+ "step": 810
527
+ },
528
+ {
529
+ "epoch": 5.22,
530
+ "learning_rate": 4.616326530612245e-05,
531
+ "loss": 0.2459,
532
+ "step": 820
533
+ },
534
+ {
535
+ "epoch": 5.29,
536
+ "learning_rate": 4.555102040816326e-05,
537
+ "loss": 0.2595,
538
+ "step": 830
539
+ },
540
+ {
541
+ "epoch": 5.35,
542
+ "learning_rate": 4.4938775510204084e-05,
543
+ "loss": 0.2545,
544
+ "step": 840
545
+ },
546
+ {
547
+ "epoch": 5.41,
548
+ "learning_rate": 4.43265306122449e-05,
549
+ "loss": 0.2628,
550
+ "step": 850
551
+ },
552
+ {
553
+ "epoch": 5.48,
554
+ "learning_rate": 4.3714285714285715e-05,
555
+ "loss": 0.2606,
556
+ "step": 860
557
+ },
558
+ {
559
+ "epoch": 5.54,
560
+ "learning_rate": 4.310204081632653e-05,
561
+ "loss": 0.2512,
562
+ "step": 870
563
+ },
564
+ {
565
+ "epoch": 5.61,
566
+ "learning_rate": 4.2489795918367345e-05,
567
+ "loss": 0.2566,
568
+ "step": 880
569
+ },
570
+ {
571
+ "epoch": 5.67,
572
+ "learning_rate": 4.187755102040816e-05,
573
+ "loss": 0.2572,
574
+ "step": 890
575
+ },
576
+ {
577
+ "epoch": 5.73,
578
+ "learning_rate": 4.126530612244898e-05,
579
+ "loss": 0.247,
580
+ "step": 900
581
+ },
582
+ {
583
+ "epoch": 5.8,
584
+ "learning_rate": 4.06530612244898e-05,
585
+ "loss": 0.255,
586
+ "step": 910
587
+ },
588
+ {
589
+ "epoch": 5.86,
590
+ "learning_rate": 4.004081632653061e-05,
591
+ "loss": 0.263,
592
+ "step": 920
593
+ },
594
+ {
595
+ "epoch": 5.92,
596
+ "learning_rate": 3.942857142857143e-05,
597
+ "loss": 0.2622,
598
+ "step": 930
599
+ },
600
+ {
601
+ "epoch": 5.99,
602
+ "learning_rate": 3.8816326530612244e-05,
603
+ "loss": 0.2496,
604
+ "step": 940
605
+ },
606
+ {
607
+ "epoch": 6.05,
608
+ "learning_rate": 3.820408163265306e-05,
609
+ "loss": 0.258,
610
+ "step": 950
611
+ },
612
+ {
613
+ "epoch": 6.11,
614
+ "learning_rate": 3.7591836734693874e-05,
615
+ "loss": 0.2425,
616
+ "step": 960
617
+ },
618
+ {
619
+ "epoch": 6.18,
620
+ "learning_rate": 3.6979591836734696e-05,
621
+ "loss": 0.2444,
622
+ "step": 970
623
+ },
624
+ {
625
+ "epoch": 6.24,
626
+ "learning_rate": 3.636734693877551e-05,
627
+ "loss": 0.2476,
628
+ "step": 980
629
+ },
630
+ {
631
+ "epoch": 6.31,
632
+ "learning_rate": 3.575510204081633e-05,
633
+ "loss": 0.2429,
634
+ "step": 990
635
+ },
636
+ {
637
+ "epoch": 6.37,
638
+ "learning_rate": 3.514285714285714e-05,
639
+ "loss": 0.2512,
640
+ "step": 1000
641
+ },
642
+ {
643
+ "epoch": 6.37,
644
+ "eval_loss": 0.2628234326839447,
645
+ "eval_runtime": 178.8184,
646
+ "eval_samples_per_second": 11.185,
647
+ "eval_steps_per_second": 0.699,
648
+ "step": 1000
649
+ },
650
+ {
651
+ "epoch": 6.43,
652
+ "learning_rate": 3.453061224489796e-05,
653
+ "loss": 0.2484,
654
+ "step": 1010
655
+ },
656
+ {
657
+ "epoch": 6.5,
658
+ "learning_rate": 3.391836734693878e-05,
659
+ "loss": 0.2448,
660
+ "step": 1020
661
+ },
662
+ {
663
+ "epoch": 6.56,
664
+ "learning_rate": 3.3306122448979595e-05,
665
+ "loss": 0.2355,
666
+ "step": 1030
667
+ },
668
+ {
669
+ "epoch": 6.62,
670
+ "learning_rate": 3.269387755102041e-05,
671
+ "loss": 0.2445,
672
+ "step": 1040
673
+ },
674
+ {
675
+ "epoch": 6.69,
676
+ "learning_rate": 3.2081632653061225e-05,
677
+ "loss": 0.2477,
678
+ "step": 1050
679
+ },
680
+ {
681
+ "epoch": 6.75,
682
+ "learning_rate": 3.146938775510204e-05,
683
+ "loss": 0.2442,
684
+ "step": 1060
685
+ },
686
+ {
687
+ "epoch": 6.82,
688
+ "learning_rate": 3.0857142857142856e-05,
689
+ "loss": 0.2449,
690
+ "step": 1070
691
+ },
692
+ {
693
+ "epoch": 6.88,
694
+ "learning_rate": 3.0244897959183675e-05,
695
+ "loss": 0.2382,
696
+ "step": 1080
697
+ },
698
+ {
699
+ "epoch": 6.94,
700
+ "learning_rate": 2.963265306122449e-05,
701
+ "loss": 0.2392,
702
+ "step": 1090
703
+ },
704
+ {
705
+ "epoch": 7.01,
706
+ "learning_rate": 2.9020408163265305e-05,
707
+ "loss": 0.2418,
708
+ "step": 1100
709
+ },
710
+ {
711
+ "epoch": 7.07,
712
+ "learning_rate": 2.840816326530612e-05,
713
+ "loss": 0.2387,
714
+ "step": 1110
715
+ },
716
+ {
717
+ "epoch": 7.13,
718
+ "learning_rate": 2.7795918367346943e-05,
719
+ "loss": 0.2431,
720
+ "step": 1120
721
+ },
722
+ {
723
+ "epoch": 7.2,
724
+ "learning_rate": 2.7183673469387758e-05,
725
+ "loss": 0.2453,
726
+ "step": 1130
727
+ },
728
+ {
729
+ "epoch": 7.26,
730
+ "learning_rate": 2.6571428571428573e-05,
731
+ "loss": 0.2452,
732
+ "step": 1140
733
+ },
734
+ {
735
+ "epoch": 7.32,
736
+ "learning_rate": 2.595918367346939e-05,
737
+ "loss": 0.2411,
738
+ "step": 1150
739
+ },
740
+ {
741
+ "epoch": 7.39,
742
+ "learning_rate": 2.5346938775510204e-05,
743
+ "loss": 0.2448,
744
+ "step": 1160
745
+ },
746
+ {
747
+ "epoch": 7.45,
748
+ "learning_rate": 2.473469387755102e-05,
749
+ "loss": 0.2407,
750
+ "step": 1170
751
+ },
752
+ {
753
+ "epoch": 7.52,
754
+ "learning_rate": 2.4122448979591838e-05,
755
+ "loss": 0.2428,
756
+ "step": 1180
757
+ },
758
+ {
759
+ "epoch": 7.58,
760
+ "learning_rate": 2.3510204081632656e-05,
761
+ "loss": 0.2434,
762
+ "step": 1190
763
+ },
764
+ {
765
+ "epoch": 7.64,
766
+ "learning_rate": 2.289795918367347e-05,
767
+ "loss": 0.2435,
768
+ "step": 1200
769
+ },
770
+ {
771
+ "epoch": 7.64,
772
+ "eval_loss": 0.25768929719924927,
773
+ "eval_runtime": 179.2212,
774
+ "eval_samples_per_second": 11.159,
775
+ "eval_steps_per_second": 0.697,
776
+ "step": 1200
777
+ }
778
+ ],
779
+ "max_steps": 1570,
780
+ "num_train_epochs": 10,
781
+ "total_flos": 3.882151346962432e+17,
782
+ "trial_name": null,
783
+ "trial_params": null
784
+ }
{checkpoint-200 β†’ checkpoint-1200}/training_args.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7dd968ac3692484ae7ef48f2fe56fbc52fc522e7cedd7f98c741d309d92c7408
3
  size 3579
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6db1d76c8c58296c6ccd7f8675679f24bb5bc1caeeabf527135e316aab994bff
3
  size 3579
checkpoint-1400/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbaee1746d307e046d56259f91baf7c3620ed7f1755c0c8e13e53a65fa9f5a24
3
+ size 33661637
checkpoint-1400/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5b34574739dc0db6163cd0340442562a180e8a6ea7c97efaeb9a7a7a295999b
3
+ size 16822989
checkpoint-1400/rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8df4712fb96da9516182a13dcca9fe77e2d570a5eb7182924dbd8d17b6330900
3
+ size 14583
checkpoint-1400/rng_state_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b60d9cf614eb8a06e3c140ef91cf08487cf6a5eb292e2bc44a9b26d184472cf7
3
+ size 14583
checkpoint-1400/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51b0c632f8bb35328924fe2936840fcb014f196c4a6f71dcd86e89c10b1a5342
3
+ size 557
checkpoint-1400/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1374d0569a391f02ab3ad7de4ced2a5779221864e779ce80936fee4d4059e245
3
+ size 627
checkpoint-1400/trainer_state.json ADDED
@@ -0,0 +1,912 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.25404658913612366,
3
+ "best_model_checkpoint": "./lora-alpaca/checkpoint-1400",
4
+ "epoch": 8.9171974522293,
5
+ "global_step": 1400,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.06,
12
+ "learning_rate": 7.2000000000000005e-06,
13
+ "loss": 2.2569,
14
+ "step": 10
15
+ },
16
+ {
17
+ "epoch": 0.13,
18
+ "learning_rate": 1.62e-05,
19
+ "loss": 2.1837,
20
+ "step": 20
21
+ },
22
+ {
23
+ "epoch": 0.19,
24
+ "learning_rate": 2.5200000000000003e-05,
25
+ "loss": 2.0973,
26
+ "step": 30
27
+ },
28
+ {
29
+ "epoch": 0.25,
30
+ "learning_rate": 3.4200000000000005e-05,
31
+ "loss": 2.0635,
32
+ "step": 40
33
+ },
34
+ {
35
+ "epoch": 0.32,
36
+ "learning_rate": 4.32e-05,
37
+ "loss": 1.9233,
38
+ "step": 50
39
+ },
40
+ {
41
+ "epoch": 0.38,
42
+ "learning_rate": 5.13e-05,
43
+ "loss": 1.6655,
44
+ "step": 60
45
+ },
46
+ {
47
+ "epoch": 0.45,
48
+ "learning_rate": 5.940000000000001e-05,
49
+ "loss": 1.4289,
50
+ "step": 70
51
+ },
52
+ {
53
+ "epoch": 0.51,
54
+ "learning_rate": 6.840000000000001e-05,
55
+ "loss": 1.2303,
56
+ "step": 80
57
+ },
58
+ {
59
+ "epoch": 0.57,
60
+ "learning_rate": 7.74e-05,
61
+ "loss": 1.1787,
62
+ "step": 90
63
+ },
64
+ {
65
+ "epoch": 0.64,
66
+ "learning_rate": 8.64e-05,
67
+ "loss": 1.0882,
68
+ "step": 100
69
+ },
70
+ {
71
+ "epoch": 0.7,
72
+ "learning_rate": 8.96326530612245e-05,
73
+ "loss": 1.0611,
74
+ "step": 110
75
+ },
76
+ {
77
+ "epoch": 0.76,
78
+ "learning_rate": 8.902040816326531e-05,
79
+ "loss": 0.9818,
80
+ "step": 120
81
+ },
82
+ {
83
+ "epoch": 0.83,
84
+ "learning_rate": 8.840816326530614e-05,
85
+ "loss": 0.9416,
86
+ "step": 130
87
+ },
88
+ {
89
+ "epoch": 0.89,
90
+ "learning_rate": 8.779591836734695e-05,
91
+ "loss": 0.9049,
92
+ "step": 140
93
+ },
94
+ {
95
+ "epoch": 0.96,
96
+ "learning_rate": 8.718367346938777e-05,
97
+ "loss": 0.8774,
98
+ "step": 150
99
+ },
100
+ {
101
+ "epoch": 1.02,
102
+ "learning_rate": 8.657142857142858e-05,
103
+ "loss": 0.8507,
104
+ "step": 160
105
+ },
106
+ {
107
+ "epoch": 1.08,
108
+ "learning_rate": 8.59591836734694e-05,
109
+ "loss": 0.7797,
110
+ "step": 170
111
+ },
112
+ {
113
+ "epoch": 1.15,
114
+ "learning_rate": 8.534693877551022e-05,
115
+ "loss": 0.7334,
116
+ "step": 180
117
+ },
118
+ {
119
+ "epoch": 1.21,
120
+ "learning_rate": 8.473469387755103e-05,
121
+ "loss": 0.712,
122
+ "step": 190
123
+ },
124
+ {
125
+ "epoch": 1.27,
126
+ "learning_rate": 8.412244897959185e-05,
127
+ "loss": 0.6587,
128
+ "step": 200
129
+ },
130
+ {
131
+ "epoch": 1.27,
132
+ "eval_loss": 0.6659865379333496,
133
+ "eval_runtime": 179.3178,
134
+ "eval_samples_per_second": 11.153,
135
+ "eval_steps_per_second": 0.697,
136
+ "step": 200
137
+ },
138
+ {
139
+ "epoch": 1.34,
140
+ "learning_rate": 8.351020408163266e-05,
141
+ "loss": 0.6116,
142
+ "step": 210
143
+ },
144
+ {
145
+ "epoch": 1.4,
146
+ "learning_rate": 8.289795918367348e-05,
147
+ "loss": 0.6065,
148
+ "step": 220
149
+ },
150
+ {
151
+ "epoch": 1.46,
152
+ "learning_rate": 8.228571428571429e-05,
153
+ "loss": 0.6109,
154
+ "step": 230
155
+ },
156
+ {
157
+ "epoch": 1.53,
158
+ "learning_rate": 8.167346938775511e-05,
159
+ "loss": 0.5603,
160
+ "step": 240
161
+ },
162
+ {
163
+ "epoch": 1.59,
164
+ "learning_rate": 8.106122448979592e-05,
165
+ "loss": 0.5223,
166
+ "step": 250
167
+ },
168
+ {
169
+ "epoch": 1.66,
170
+ "learning_rate": 8.044897959183674e-05,
171
+ "loss": 0.5039,
172
+ "step": 260
173
+ },
174
+ {
175
+ "epoch": 1.72,
176
+ "learning_rate": 7.983673469387757e-05,
177
+ "loss": 0.4482,
178
+ "step": 270
179
+ },
180
+ {
181
+ "epoch": 1.78,
182
+ "learning_rate": 7.922448979591838e-05,
183
+ "loss": 0.4494,
184
+ "step": 280
185
+ },
186
+ {
187
+ "epoch": 1.85,
188
+ "learning_rate": 7.86122448979592e-05,
189
+ "loss": 0.4578,
190
+ "step": 290
191
+ },
192
+ {
193
+ "epoch": 1.91,
194
+ "learning_rate": 7.800000000000001e-05,
195
+ "loss": 0.4346,
196
+ "step": 300
197
+ },
198
+ {
199
+ "epoch": 1.97,
200
+ "learning_rate": 7.738775510204083e-05,
201
+ "loss": 0.4274,
202
+ "step": 310
203
+ },
204
+ {
205
+ "epoch": 2.04,
206
+ "learning_rate": 7.677551020408164e-05,
207
+ "loss": 0.3999,
208
+ "step": 320
209
+ },
210
+ {
211
+ "epoch": 2.1,
212
+ "learning_rate": 7.616326530612246e-05,
213
+ "loss": 0.3825,
214
+ "step": 330
215
+ },
216
+ {
217
+ "epoch": 2.17,
218
+ "learning_rate": 7.555102040816327e-05,
219
+ "loss": 0.382,
220
+ "step": 340
221
+ },
222
+ {
223
+ "epoch": 2.23,
224
+ "learning_rate": 7.493877551020409e-05,
225
+ "loss": 0.3652,
226
+ "step": 350
227
+ },
228
+ {
229
+ "epoch": 2.29,
230
+ "learning_rate": 7.43265306122449e-05,
231
+ "loss": 0.3732,
232
+ "step": 360
233
+ },
234
+ {
235
+ "epoch": 2.36,
236
+ "learning_rate": 7.371428571428572e-05,
237
+ "loss": 0.3422,
238
+ "step": 370
239
+ },
240
+ {
241
+ "epoch": 2.42,
242
+ "learning_rate": 7.310204081632653e-05,
243
+ "loss": 0.3661,
244
+ "step": 380
245
+ },
246
+ {
247
+ "epoch": 2.48,
248
+ "learning_rate": 7.248979591836735e-05,
249
+ "loss": 0.3461,
250
+ "step": 390
251
+ },
252
+ {
253
+ "epoch": 2.55,
254
+ "learning_rate": 7.187755102040816e-05,
255
+ "loss": 0.3594,
256
+ "step": 400
257
+ },
258
+ {
259
+ "epoch": 2.55,
260
+ "eval_loss": 0.35238778591156006,
261
+ "eval_runtime": 179.1609,
262
+ "eval_samples_per_second": 11.163,
263
+ "eval_steps_per_second": 0.698,
264
+ "step": 400
265
+ },
266
+ {
267
+ "epoch": 2.61,
268
+ "learning_rate": 7.126530612244898e-05,
269
+ "loss": 0.3472,
270
+ "step": 410
271
+ },
272
+ {
273
+ "epoch": 2.68,
274
+ "learning_rate": 7.065306122448981e-05,
275
+ "loss": 0.3314,
276
+ "step": 420
277
+ },
278
+ {
279
+ "epoch": 2.74,
280
+ "learning_rate": 7.004081632653062e-05,
281
+ "loss": 0.3323,
282
+ "step": 430
283
+ },
284
+ {
285
+ "epoch": 2.8,
286
+ "learning_rate": 6.942857142857144e-05,
287
+ "loss": 0.3141,
288
+ "step": 440
289
+ },
290
+ {
291
+ "epoch": 2.87,
292
+ "learning_rate": 6.881632653061225e-05,
293
+ "loss": 0.3242,
294
+ "step": 450
295
+ },
296
+ {
297
+ "epoch": 2.93,
298
+ "learning_rate": 6.820408163265307e-05,
299
+ "loss": 0.329,
300
+ "step": 460
301
+ },
302
+ {
303
+ "epoch": 2.99,
304
+ "learning_rate": 6.759183673469389e-05,
305
+ "loss": 0.3146,
306
+ "step": 470
307
+ },
308
+ {
309
+ "epoch": 3.06,
310
+ "learning_rate": 6.69795918367347e-05,
311
+ "loss": 0.289,
312
+ "step": 480
313
+ },
314
+ {
315
+ "epoch": 3.12,
316
+ "learning_rate": 6.636734693877552e-05,
317
+ "loss": 0.2973,
318
+ "step": 490
319
+ },
320
+ {
321
+ "epoch": 3.18,
322
+ "learning_rate": 6.575510204081633e-05,
323
+ "loss": 0.313,
324
+ "step": 500
325
+ },
326
+ {
327
+ "epoch": 3.25,
328
+ "learning_rate": 6.514285714285715e-05,
329
+ "loss": 0.2974,
330
+ "step": 510
331
+ },
332
+ {
333
+ "epoch": 3.31,
334
+ "learning_rate": 6.453061224489796e-05,
335
+ "loss": 0.2822,
336
+ "step": 520
337
+ },
338
+ {
339
+ "epoch": 3.38,
340
+ "learning_rate": 6.391836734693878e-05,
341
+ "loss": 0.2925,
342
+ "step": 530
343
+ },
344
+ {
345
+ "epoch": 3.44,
346
+ "learning_rate": 6.330612244897959e-05,
347
+ "loss": 0.2951,
348
+ "step": 540
349
+ },
350
+ {
351
+ "epoch": 3.5,
352
+ "learning_rate": 6.269387755102041e-05,
353
+ "loss": 0.2875,
354
+ "step": 550
355
+ },
356
+ {
357
+ "epoch": 3.57,
358
+ "learning_rate": 6.208163265306124e-05,
359
+ "loss": 0.2889,
360
+ "step": 560
361
+ },
362
+ {
363
+ "epoch": 3.63,
364
+ "learning_rate": 6.146938775510205e-05,
365
+ "loss": 0.2871,
366
+ "step": 570
367
+ },
368
+ {
369
+ "epoch": 3.69,
370
+ "learning_rate": 6.085714285714287e-05,
371
+ "loss": 0.2718,
372
+ "step": 580
373
+ },
374
+ {
375
+ "epoch": 3.76,
376
+ "learning_rate": 6.024489795918368e-05,
377
+ "loss": 0.285,
378
+ "step": 590
379
+ },
380
+ {
381
+ "epoch": 3.82,
382
+ "learning_rate": 5.96326530612245e-05,
383
+ "loss": 0.2705,
384
+ "step": 600
385
+ },
386
+ {
387
+ "epoch": 3.82,
388
+ "eval_loss": 0.2906799912452698,
389
+ "eval_runtime": 178.9594,
390
+ "eval_samples_per_second": 11.176,
391
+ "eval_steps_per_second": 0.698,
392
+ "step": 600
393
+ },
394
+ {
395
+ "epoch": 3.89,
396
+ "learning_rate": 5.902040816326531e-05,
397
+ "loss": 0.2794,
398
+ "step": 610
399
+ },
400
+ {
401
+ "epoch": 3.95,
402
+ "learning_rate": 5.840816326530613e-05,
403
+ "loss": 0.2784,
404
+ "step": 620
405
+ },
406
+ {
407
+ "epoch": 4.01,
408
+ "learning_rate": 5.779591836734694e-05,
409
+ "loss": 0.2862,
410
+ "step": 630
411
+ },
412
+ {
413
+ "epoch": 4.08,
414
+ "learning_rate": 5.718367346938776e-05,
415
+ "loss": 0.2695,
416
+ "step": 640
417
+ },
418
+ {
419
+ "epoch": 4.14,
420
+ "learning_rate": 5.6571428571428574e-05,
421
+ "loss": 0.2732,
422
+ "step": 650
423
+ },
424
+ {
425
+ "epoch": 4.2,
426
+ "learning_rate": 5.595918367346939e-05,
427
+ "loss": 0.265,
428
+ "step": 660
429
+ },
430
+ {
431
+ "epoch": 4.27,
432
+ "learning_rate": 5.5346938775510204e-05,
433
+ "loss": 0.2703,
434
+ "step": 670
435
+ },
436
+ {
437
+ "epoch": 4.33,
438
+ "learning_rate": 5.473469387755102e-05,
439
+ "loss": 0.2683,
440
+ "step": 680
441
+ },
442
+ {
443
+ "epoch": 4.39,
444
+ "learning_rate": 5.4122448979591835e-05,
445
+ "loss": 0.2666,
446
+ "step": 690
447
+ },
448
+ {
449
+ "epoch": 4.46,
450
+ "learning_rate": 5.351020408163265e-05,
451
+ "loss": 0.2571,
452
+ "step": 700
453
+ },
454
+ {
455
+ "epoch": 4.52,
456
+ "learning_rate": 5.289795918367348e-05,
457
+ "loss": 0.2723,
458
+ "step": 710
459
+ },
460
+ {
461
+ "epoch": 4.59,
462
+ "learning_rate": 5.2285714285714294e-05,
463
+ "loss": 0.2739,
464
+ "step": 720
465
+ },
466
+ {
467
+ "epoch": 4.65,
468
+ "learning_rate": 5.167346938775511e-05,
469
+ "loss": 0.2592,
470
+ "step": 730
471
+ },
472
+ {
473
+ "epoch": 4.71,
474
+ "learning_rate": 5.1061224489795925e-05,
475
+ "loss": 0.261,
476
+ "step": 740
477
+ },
478
+ {
479
+ "epoch": 4.78,
480
+ "learning_rate": 5.044897959183674e-05,
481
+ "loss": 0.2613,
482
+ "step": 750
483
+ },
484
+ {
485
+ "epoch": 4.84,
486
+ "learning_rate": 4.9836734693877555e-05,
487
+ "loss": 0.2641,
488
+ "step": 760
489
+ },
490
+ {
491
+ "epoch": 4.9,
492
+ "learning_rate": 4.922448979591837e-05,
493
+ "loss": 0.2648,
494
+ "step": 770
495
+ },
496
+ {
497
+ "epoch": 4.97,
498
+ "learning_rate": 4.8612244897959186e-05,
499
+ "loss": 0.2632,
500
+ "step": 780
501
+ },
502
+ {
503
+ "epoch": 5.03,
504
+ "learning_rate": 4.8e-05,
505
+ "loss": 0.2692,
506
+ "step": 790
507
+ },
508
+ {
509
+ "epoch": 5.1,
510
+ "learning_rate": 4.7387755102040816e-05,
511
+ "loss": 0.2526,
512
+ "step": 800
513
+ },
514
+ {
515
+ "epoch": 5.1,
516
+ "eval_loss": 0.273679256439209,
517
+ "eval_runtime": 179.1874,
518
+ "eval_samples_per_second": 11.162,
519
+ "eval_steps_per_second": 0.698,
520
+ "step": 800
521
+ },
522
+ {
523
+ "epoch": 5.16,
524
+ "learning_rate": 4.677551020408163e-05,
525
+ "loss": 0.261,
526
+ "step": 810
527
+ },
528
+ {
529
+ "epoch": 5.22,
530
+ "learning_rate": 4.616326530612245e-05,
531
+ "loss": 0.2459,
532
+ "step": 820
533
+ },
534
+ {
535
+ "epoch": 5.29,
536
+ "learning_rate": 4.555102040816326e-05,
537
+ "loss": 0.2595,
538
+ "step": 830
539
+ },
540
+ {
541
+ "epoch": 5.35,
542
+ "learning_rate": 4.4938775510204084e-05,
543
+ "loss": 0.2545,
544
+ "step": 840
545
+ },
546
+ {
547
+ "epoch": 5.41,
548
+ "learning_rate": 4.43265306122449e-05,
549
+ "loss": 0.2628,
550
+ "step": 850
551
+ },
552
+ {
553
+ "epoch": 5.48,
554
+ "learning_rate": 4.3714285714285715e-05,
555
+ "loss": 0.2606,
556
+ "step": 860
557
+ },
558
+ {
559
+ "epoch": 5.54,
560
+ "learning_rate": 4.310204081632653e-05,
561
+ "loss": 0.2512,
562
+ "step": 870
563
+ },
564
+ {
565
+ "epoch": 5.61,
566
+ "learning_rate": 4.2489795918367345e-05,
567
+ "loss": 0.2566,
568
+ "step": 880
569
+ },
570
+ {
571
+ "epoch": 5.67,
572
+ "learning_rate": 4.187755102040816e-05,
573
+ "loss": 0.2572,
574
+ "step": 890
575
+ },
576
+ {
577
+ "epoch": 5.73,
578
+ "learning_rate": 4.126530612244898e-05,
579
+ "loss": 0.247,
580
+ "step": 900
581
+ },
582
+ {
583
+ "epoch": 5.8,
584
+ "learning_rate": 4.06530612244898e-05,
585
+ "loss": 0.255,
586
+ "step": 910
587
+ },
588
+ {
589
+ "epoch": 5.86,
590
+ "learning_rate": 4.004081632653061e-05,
591
+ "loss": 0.263,
592
+ "step": 920
593
+ },
594
+ {
595
+ "epoch": 5.92,
596
+ "learning_rate": 3.942857142857143e-05,
597
+ "loss": 0.2622,
598
+ "step": 930
599
+ },
600
+ {
601
+ "epoch": 5.99,
602
+ "learning_rate": 3.8816326530612244e-05,
603
+ "loss": 0.2496,
604
+ "step": 940
605
+ },
606
+ {
607
+ "epoch": 6.05,
608
+ "learning_rate": 3.820408163265306e-05,
609
+ "loss": 0.258,
610
+ "step": 950
611
+ },
612
+ {
613
+ "epoch": 6.11,
614
+ "learning_rate": 3.7591836734693874e-05,
615
+ "loss": 0.2425,
616
+ "step": 960
617
+ },
618
+ {
619
+ "epoch": 6.18,
620
+ "learning_rate": 3.6979591836734696e-05,
621
+ "loss": 0.2444,
622
+ "step": 970
623
+ },
624
+ {
625
+ "epoch": 6.24,
626
+ "learning_rate": 3.636734693877551e-05,
627
+ "loss": 0.2476,
628
+ "step": 980
629
+ },
630
+ {
631
+ "epoch": 6.31,
632
+ "learning_rate": 3.575510204081633e-05,
633
+ "loss": 0.2429,
634
+ "step": 990
635
+ },
636
+ {
637
+ "epoch": 6.37,
638
+ "learning_rate": 3.514285714285714e-05,
639
+ "loss": 0.2512,
640
+ "step": 1000
641
+ },
642
+ {
643
+ "epoch": 6.37,
644
+ "eval_loss": 0.2628234326839447,
645
+ "eval_runtime": 178.8184,
646
+ "eval_samples_per_second": 11.185,
647
+ "eval_steps_per_second": 0.699,
648
+ "step": 1000
649
+ },
650
+ {
651
+ "epoch": 6.43,
652
+ "learning_rate": 3.453061224489796e-05,
653
+ "loss": 0.2484,
654
+ "step": 1010
655
+ },
656
+ {
657
+ "epoch": 6.5,
658
+ "learning_rate": 3.391836734693878e-05,
659
+ "loss": 0.2448,
660
+ "step": 1020
661
+ },
662
+ {
663
+ "epoch": 6.56,
664
+ "learning_rate": 3.3306122448979595e-05,
665
+ "loss": 0.2355,
666
+ "step": 1030
667
+ },
668
+ {
669
+ "epoch": 6.62,
670
+ "learning_rate": 3.269387755102041e-05,
671
+ "loss": 0.2445,
672
+ "step": 1040
673
+ },
674
+ {
675
+ "epoch": 6.69,
676
+ "learning_rate": 3.2081632653061225e-05,
677
+ "loss": 0.2477,
678
+ "step": 1050
679
+ },
680
+ {
681
+ "epoch": 6.75,
682
+ "learning_rate": 3.146938775510204e-05,
683
+ "loss": 0.2442,
684
+ "step": 1060
685
+ },
686
+ {
687
+ "epoch": 6.82,
688
+ "learning_rate": 3.0857142857142856e-05,
689
+ "loss": 0.2449,
690
+ "step": 1070
691
+ },
692
+ {
693
+ "epoch": 6.88,
694
+ "learning_rate": 3.0244897959183675e-05,
695
+ "loss": 0.2382,
696
+ "step": 1080
697
+ },
698
+ {
699
+ "epoch": 6.94,
700
+ "learning_rate": 2.963265306122449e-05,
701
+ "loss": 0.2392,
702
+ "step": 1090
703
+ },
704
+ {
705
+ "epoch": 7.01,
706
+ "learning_rate": 2.9020408163265305e-05,
707
+ "loss": 0.2418,
708
+ "step": 1100
709
+ },
710
+ {
711
+ "epoch": 7.07,
712
+ "learning_rate": 2.840816326530612e-05,
713
+ "loss": 0.2387,
714
+ "step": 1110
715
+ },
716
+ {
717
+ "epoch": 7.13,
718
+ "learning_rate": 2.7795918367346943e-05,
719
+ "loss": 0.2431,
720
+ "step": 1120
721
+ },
722
+ {
723
+ "epoch": 7.2,
724
+ "learning_rate": 2.7183673469387758e-05,
725
+ "loss": 0.2453,
726
+ "step": 1130
727
+ },
728
+ {
729
+ "epoch": 7.26,
730
+ "learning_rate": 2.6571428571428573e-05,
731
+ "loss": 0.2452,
732
+ "step": 1140
733
+ },
734
+ {
735
+ "epoch": 7.32,
736
+ "learning_rate": 2.595918367346939e-05,
737
+ "loss": 0.2411,
738
+ "step": 1150
739
+ },
740
+ {
741
+ "epoch": 7.39,
742
+ "learning_rate": 2.5346938775510204e-05,
743
+ "loss": 0.2448,
744
+ "step": 1160
745
+ },
746
+ {
747
+ "epoch": 7.45,
748
+ "learning_rate": 2.473469387755102e-05,
749
+ "loss": 0.2407,
750
+ "step": 1170
751
+ },
752
+ {
753
+ "epoch": 7.52,
754
+ "learning_rate": 2.4122448979591838e-05,
755
+ "loss": 0.2428,
756
+ "step": 1180
757
+ },
758
+ {
759
+ "epoch": 7.58,
760
+ "learning_rate": 2.3510204081632656e-05,
761
+ "loss": 0.2434,
762
+ "step": 1190
763
+ },
764
+ {
765
+ "epoch": 7.64,
766
+ "learning_rate": 2.289795918367347e-05,
767
+ "loss": 0.2435,
768
+ "step": 1200
769
+ },
770
+ {
771
+ "epoch": 7.64,
772
+ "eval_loss": 0.25768929719924927,
773
+ "eval_runtime": 179.2212,
774
+ "eval_samples_per_second": 11.159,
775
+ "eval_steps_per_second": 0.697,
776
+ "step": 1200
777
+ },
778
+ {
779
+ "epoch": 7.71,
780
+ "learning_rate": 2.2285714285714287e-05,
781
+ "loss": 0.2428,
782
+ "step": 1210
783
+ },
784
+ {
785
+ "epoch": 7.77,
786
+ "learning_rate": 2.1673469387755102e-05,
787
+ "loss": 0.2424,
788
+ "step": 1220
789
+ },
790
+ {
791
+ "epoch": 7.83,
792
+ "learning_rate": 2.106122448979592e-05,
793
+ "loss": 0.2354,
794
+ "step": 1230
795
+ },
796
+ {
797
+ "epoch": 7.9,
798
+ "learning_rate": 2.0448979591836736e-05,
799
+ "loss": 0.2402,
800
+ "step": 1240
801
+ },
802
+ {
803
+ "epoch": 7.96,
804
+ "learning_rate": 1.983673469387755e-05,
805
+ "loss": 0.2428,
806
+ "step": 1250
807
+ },
808
+ {
809
+ "epoch": 8.03,
810
+ "learning_rate": 1.922448979591837e-05,
811
+ "loss": 0.2302,
812
+ "step": 1260
813
+ },
814
+ {
815
+ "epoch": 8.09,
816
+ "learning_rate": 1.8612244897959185e-05,
817
+ "loss": 0.2252,
818
+ "step": 1270
819
+ },
820
+ {
821
+ "epoch": 8.15,
822
+ "learning_rate": 1.8e-05,
823
+ "loss": 0.2469,
824
+ "step": 1280
825
+ },
826
+ {
827
+ "epoch": 8.22,
828
+ "learning_rate": 1.738775510204082e-05,
829
+ "loss": 0.2455,
830
+ "step": 1290
831
+ },
832
+ {
833
+ "epoch": 8.28,
834
+ "learning_rate": 1.6775510204081634e-05,
835
+ "loss": 0.231,
836
+ "step": 1300
837
+ },
838
+ {
839
+ "epoch": 8.34,
840
+ "learning_rate": 1.616326530612245e-05,
841
+ "loss": 0.2331,
842
+ "step": 1310
843
+ },
844
+ {
845
+ "epoch": 8.41,
846
+ "learning_rate": 1.555102040816327e-05,
847
+ "loss": 0.2368,
848
+ "step": 1320
849
+ },
850
+ {
851
+ "epoch": 8.47,
852
+ "learning_rate": 1.4938775510204082e-05,
853
+ "loss": 0.2341,
854
+ "step": 1330
855
+ },
856
+ {
857
+ "epoch": 8.54,
858
+ "learning_rate": 1.4326530612244899e-05,
859
+ "loss": 0.2439,
860
+ "step": 1340
861
+ },
862
+ {
863
+ "epoch": 8.6,
864
+ "learning_rate": 1.3714285714285716e-05,
865
+ "loss": 0.2457,
866
+ "step": 1350
867
+ },
868
+ {
869
+ "epoch": 8.66,
870
+ "learning_rate": 1.3102040816326533e-05,
871
+ "loss": 0.2289,
872
+ "step": 1360
873
+ },
874
+ {
875
+ "epoch": 8.73,
876
+ "learning_rate": 1.2489795918367348e-05,
877
+ "loss": 0.2334,
878
+ "step": 1370
879
+ },
880
+ {
881
+ "epoch": 8.79,
882
+ "learning_rate": 1.1877551020408163e-05,
883
+ "loss": 0.2342,
884
+ "step": 1380
885
+ },
886
+ {
887
+ "epoch": 8.85,
888
+ "learning_rate": 1.1265306122448982e-05,
889
+ "loss": 0.2327,
890
+ "step": 1390
891
+ },
892
+ {
893
+ "epoch": 8.92,
894
+ "learning_rate": 1.0653061224489797e-05,
895
+ "loss": 0.2294,
896
+ "step": 1400
897
+ },
898
+ {
899
+ "epoch": 8.92,
900
+ "eval_loss": 0.25404658913612366,
901
+ "eval_runtime": 178.5583,
902
+ "eval_samples_per_second": 11.201,
903
+ "eval_steps_per_second": 0.7,
904
+ "step": 1400
905
+ }
906
+ ],
907
+ "max_steps": 1570,
908
+ "num_train_epochs": 10,
909
+ "total_flos": 4.529654718980424e+17,
910
+ "trial_name": null,
911
+ "trial_params": null
912
+ }
checkpoint-1400/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6db1d76c8c58296c6ccd7f8675679f24bb5bc1caeeabf527135e316aab994bff
3
+ size 3579
checkpoint-200/trainer_state.json DELETED
@@ -1,144 +0,0 @@
1
- {
2
- "best_metric": 0.6555210947990417,
3
- "best_model_checkpoint": "./lora-alpaca/checkpoint-200",
4
- "epoch": 1.2738853503184713,
5
- "global_step": 200,
6
- "is_hyper_param_search": false,
7
- "is_local_process_zero": true,
8
- "is_world_process_zero": true,
9
- "log_history": [
10
- {
11
- "epoch": 0.06,
12
- "learning_rate": 8.000000000000001e-06,
13
- "loss": 2.2228,
14
- "step": 10
15
- },
16
- {
17
- "epoch": 0.13,
18
- "learning_rate": 1.8e-05,
19
- "loss": 2.185,
20
- "step": 20
21
- },
22
- {
23
- "epoch": 0.19,
24
- "learning_rate": 2.8000000000000003e-05,
25
- "loss": 2.1452,
26
- "step": 30
27
- },
28
- {
29
- "epoch": 0.25,
30
- "learning_rate": 3.8e-05,
31
- "loss": 2.0232,
32
- "step": 40
33
- },
34
- {
35
- "epoch": 0.32,
36
- "learning_rate": 4.8e-05,
37
- "loss": 1.884,
38
- "step": 50
39
- },
40
- {
41
- "epoch": 0.38,
42
- "learning_rate": 5.6000000000000006e-05,
43
- "loss": 1.62,
44
- "step": 60
45
- },
46
- {
47
- "epoch": 0.45,
48
- "learning_rate": 6.6e-05,
49
- "loss": 1.3664,
50
- "step": 70
51
- },
52
- {
53
- "epoch": 0.51,
54
- "learning_rate": 7.6e-05,
55
- "loss": 1.2159,
56
- "step": 80
57
- },
58
- {
59
- "epoch": 0.57,
60
- "learning_rate": 8.6e-05,
61
- "loss": 1.1656,
62
- "step": 90
63
- },
64
- {
65
- "epoch": 0.64,
66
- "learning_rate": 9.6e-05,
67
- "loss": 1.0664,
68
- "step": 100
69
- },
70
- {
71
- "epoch": 0.7,
72
- "learning_rate": 9.838274932614556e-05,
73
- "loss": 1.0253,
74
- "step": 110
75
- },
76
- {
77
- "epoch": 0.76,
78
- "learning_rate": 9.568733153638815e-05,
79
- "loss": 0.9716,
80
- "step": 120
81
- },
82
- {
83
- "epoch": 0.83,
84
- "learning_rate": 9.299191374663073e-05,
85
- "loss": 0.9162,
86
- "step": 130
87
- },
88
- {
89
- "epoch": 0.89,
90
- "learning_rate": 9.029649595687331e-05,
91
- "loss": 0.8849,
92
- "step": 140
93
- },
94
- {
95
- "epoch": 0.96,
96
- "learning_rate": 8.76010781671159e-05,
97
- "loss": 0.8648,
98
- "step": 150
99
- },
100
- {
101
- "epoch": 1.02,
102
- "learning_rate": 8.49056603773585e-05,
103
- "loss": 0.8077,
104
- "step": 160
105
- },
106
- {
107
- "epoch": 1.08,
108
- "learning_rate": 8.221024258760108e-05,
109
- "loss": 0.7443,
110
- "step": 170
111
- },
112
- {
113
- "epoch": 1.15,
114
- "learning_rate": 7.951482479784367e-05,
115
- "loss": 0.7253,
116
- "step": 180
117
- },
118
- {
119
- "epoch": 1.21,
120
- "learning_rate": 7.681940700808625e-05,
121
- "loss": 0.6845,
122
- "step": 190
123
- },
124
- {
125
- "epoch": 1.27,
126
- "learning_rate": 7.412398921832885e-05,
127
- "loss": 0.6956,
128
- "step": 200
129
- },
130
- {
131
- "epoch": 1.27,
132
- "eval_loss": 0.6555210947990417,
133
- "eval_runtime": 179.8763,
134
- "eval_samples_per_second": 11.119,
135
- "eval_steps_per_second": 0.695,
136
- "step": 200
137
- }
138
- ],
139
- "max_steps": 471,
140
- "num_train_epochs": 3,
141
- "total_flos": 6.475033720179917e+16,
142
- "trial_name": null,
143
- "trial_params": null
144
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-400/trainer_state.json DELETED
@@ -1,272 +0,0 @@
1
- {
2
- "best_metric": 0.3913075923919678,
3
- "best_model_checkpoint": "./lora-alpaca/checkpoint-400",
4
- "epoch": 2.5477707006369426,
5
- "global_step": 400,
6
- "is_hyper_param_search": false,
7
- "is_local_process_zero": true,
8
- "is_world_process_zero": true,
9
- "log_history": [
10
- {
11
- "epoch": 0.06,
12
- "learning_rate": 8.000000000000001e-06,
13
- "loss": 2.2228,
14
- "step": 10
15
- },
16
- {
17
- "epoch": 0.13,
18
- "learning_rate": 1.8e-05,
19
- "loss": 2.185,
20
- "step": 20
21
- },
22
- {
23
- "epoch": 0.19,
24
- "learning_rate": 2.8000000000000003e-05,
25
- "loss": 2.1452,
26
- "step": 30
27
- },
28
- {
29
- "epoch": 0.25,
30
- "learning_rate": 3.8e-05,
31
- "loss": 2.0232,
32
- "step": 40
33
- },
34
- {
35
- "epoch": 0.32,
36
- "learning_rate": 4.8e-05,
37
- "loss": 1.884,
38
- "step": 50
39
- },
40
- {
41
- "epoch": 0.38,
42
- "learning_rate": 5.6000000000000006e-05,
43
- "loss": 1.62,
44
- "step": 60
45
- },
46
- {
47
- "epoch": 0.45,
48
- "learning_rate": 6.6e-05,
49
- "loss": 1.3664,
50
- "step": 70
51
- },
52
- {
53
- "epoch": 0.51,
54
- "learning_rate": 7.6e-05,
55
- "loss": 1.2159,
56
- "step": 80
57
- },
58
- {
59
- "epoch": 0.57,
60
- "learning_rate": 8.6e-05,
61
- "loss": 1.1656,
62
- "step": 90
63
- },
64
- {
65
- "epoch": 0.64,
66
- "learning_rate": 9.6e-05,
67
- "loss": 1.0664,
68
- "step": 100
69
- },
70
- {
71
- "epoch": 0.7,
72
- "learning_rate": 9.838274932614556e-05,
73
- "loss": 1.0253,
74
- "step": 110
75
- },
76
- {
77
- "epoch": 0.76,
78
- "learning_rate": 9.568733153638815e-05,
79
- "loss": 0.9716,
80
- "step": 120
81
- },
82
- {
83
- "epoch": 0.83,
84
- "learning_rate": 9.299191374663073e-05,
85
- "loss": 0.9162,
86
- "step": 130
87
- },
88
- {
89
- "epoch": 0.89,
90
- "learning_rate": 9.029649595687331e-05,
91
- "loss": 0.8849,
92
- "step": 140
93
- },
94
- {
95
- "epoch": 0.96,
96
- "learning_rate": 8.76010781671159e-05,
97
- "loss": 0.8648,
98
- "step": 150
99
- },
100
- {
101
- "epoch": 1.02,
102
- "learning_rate": 8.49056603773585e-05,
103
- "loss": 0.8077,
104
- "step": 160
105
- },
106
- {
107
- "epoch": 1.08,
108
- "learning_rate": 8.221024258760108e-05,
109
- "loss": 0.7443,
110
- "step": 170
111
- },
112
- {
113
- "epoch": 1.15,
114
- "learning_rate": 7.951482479784367e-05,
115
- "loss": 0.7253,
116
- "step": 180
117
- },
118
- {
119
- "epoch": 1.21,
120
- "learning_rate": 7.681940700808625e-05,
121
- "loss": 0.6845,
122
- "step": 190
123
- },
124
- {
125
- "epoch": 1.27,
126
- "learning_rate": 7.412398921832885e-05,
127
- "loss": 0.6956,
128
- "step": 200
129
- },
130
- {
131
- "epoch": 1.27,
132
- "eval_loss": 0.6555210947990417,
133
- "eval_runtime": 179.8763,
134
- "eval_samples_per_second": 11.119,
135
- "eval_steps_per_second": 0.695,
136
- "step": 200
137
- },
138
- {
139
- "epoch": 1.34,
140
- "learning_rate": 7.142857142857143e-05,
141
- "loss": 0.6293,
142
- "step": 210
143
- },
144
- {
145
- "epoch": 1.4,
146
- "learning_rate": 6.873315363881401e-05,
147
- "loss": 0.5948,
148
- "step": 220
149
- },
150
- {
151
- "epoch": 1.46,
152
- "learning_rate": 6.60377358490566e-05,
153
- "loss": 0.5306,
154
- "step": 230
155
- },
156
- {
157
- "epoch": 1.53,
158
- "learning_rate": 6.33423180592992e-05,
159
- "loss": 0.5607,
160
- "step": 240
161
- },
162
- {
163
- "epoch": 1.59,
164
- "learning_rate": 6.0646900269541785e-05,
165
- "loss": 0.5095,
166
- "step": 250
167
- },
168
- {
169
- "epoch": 1.66,
170
- "learning_rate": 5.795148247978437e-05,
171
- "loss": 0.4947,
172
- "step": 260
173
- },
174
- {
175
- "epoch": 1.72,
176
- "learning_rate": 5.525606469002696e-05,
177
- "loss": 0.4856,
178
- "step": 270
179
- },
180
- {
181
- "epoch": 1.78,
182
- "learning_rate": 5.2560646900269536e-05,
183
- "loss": 0.4878,
184
- "step": 280
185
- },
186
- {
187
- "epoch": 1.85,
188
- "learning_rate": 4.986522911051213e-05,
189
- "loss": 0.4496,
190
- "step": 290
191
- },
192
- {
193
- "epoch": 1.91,
194
- "learning_rate": 4.716981132075472e-05,
195
- "loss": 0.4544,
196
- "step": 300
197
- },
198
- {
199
- "epoch": 1.97,
200
- "learning_rate": 4.447439353099731e-05,
201
- "loss": 0.4542,
202
- "step": 310
203
- },
204
- {
205
- "epoch": 2.04,
206
- "learning_rate": 4.1778975741239893e-05,
207
- "loss": 0.4556,
208
- "step": 320
209
- },
210
- {
211
- "epoch": 2.1,
212
- "learning_rate": 3.908355795148248e-05,
213
- "loss": 0.4014,
214
- "step": 330
215
- },
216
- {
217
- "epoch": 2.17,
218
- "learning_rate": 3.638814016172507e-05,
219
- "loss": 0.3893,
220
- "step": 340
221
- },
222
- {
223
- "epoch": 2.23,
224
- "learning_rate": 3.369272237196766e-05,
225
- "loss": 0.4197,
226
- "step": 350
227
- },
228
- {
229
- "epoch": 2.29,
230
- "learning_rate": 3.0997304582210244e-05,
231
- "loss": 0.3942,
232
- "step": 360
233
- },
234
- {
235
- "epoch": 2.36,
236
- "learning_rate": 2.830188679245283e-05,
237
- "loss": 0.3967,
238
- "step": 370
239
- },
240
- {
241
- "epoch": 2.42,
242
- "learning_rate": 2.5606469002695423e-05,
243
- "loss": 0.3848,
244
- "step": 380
245
- },
246
- {
247
- "epoch": 2.48,
248
- "learning_rate": 2.2911051212938006e-05,
249
- "loss": 0.3834,
250
- "step": 390
251
- },
252
- {
253
- "epoch": 2.55,
254
- "learning_rate": 2.0215633423180595e-05,
255
- "loss": 0.3647,
256
- "step": 400
257
- },
258
- {
259
- "epoch": 2.55,
260
- "eval_loss": 0.3913075923919678,
261
- "eval_runtime": 179.5793,
262
- "eval_samples_per_second": 11.137,
263
- "eval_steps_per_second": 0.696,
264
- "step": 400
265
- }
266
- ],
267
- "max_steps": 471,
268
- "num_train_epochs": 3,
269
- "total_flos": 1.2948924952215552e+17,
270
- "trial_name": null,
271
- "trial_params": null
272
- }