added terminal log to README
Browse files
README.md
CHANGED
@@ -2,13 +2,190 @@
|
|
2 |
|
3 |
For this model, a VM with 2 T4 GPUs was used.
|
4 |
|
5 |
-
|
6 |
|
7 |
|
8 |
-
|
9 |
-
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
-
## Log
|
14 |
|
|
|
2 |
|
3 |
For this model, a VM with 2 T4 GPUs was used.
|
4 |
|
5 |
+
Note 1. Output directory was initially lora-alpaca and then contents were moved to new folder when initializing git repository.
|
6 |
|
7 |
|
8 |
+
## Log
|
9 |
+
(sqltest) chrisdono@deep-learning-duo-t4-3:~/alpaca-lora$ WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/llam
|
10 |
+
a-7b-hf' --data_path 'spider' --output_dir './lora-alpaca' --num_epochs 3 --batch_size 32 --micro_batch_size 16 --learning_rate '1e-4'
|
11 |
+
WARNING:torch.distributed.run:
|
12 |
+
*****************************************
|
13 |
+
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your appli
|
14 |
+
cation as needed.
|
15 |
+
*****************************************
|
16 |
+
|
17 |
+
|
18 |
+
===================================BUG REPORT===================================
|
19 |
+
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
|
20 |
+
================================================================================
|
21 |
+
===================================BUG REPORT===================================
|
22 |
+
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
|
23 |
+
================================================================================
|
24 |
+
/opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path
|
25 |
+
s...
|
26 |
+
warn(msg)
|
27 |
+
/opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path
|
28 |
+
s...
|
29 |
+
warn(msg)
|
30 |
+
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
|
31 |
+
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
|
32 |
+
CUDA SETUP: Detected CUDA version 113
|
33 |
+
CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
|
34 |
+
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
|
35 |
+
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
|
36 |
+
CUDA SETUP: Detected CUDA version 113
|
37 |
+
CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
|
38 |
+
Training Alpaca-LoRA model with params:
|
39 |
+
base_model: decapoda-research/llama-7b-hf
|
40 |
+
data_path: spider
|
41 |
+
output_dir: ./lora-alpaca
|
42 |
+
batch_size: 32
|
43 |
+
micro_batch_size: 16
|
44 |
+
num_epochs: 3
|
45 |
+
learning_rate: 0.0001
|
46 |
+
cutoff_len: 256
|
47 |
+
val_set_size: 2000
|
48 |
+
lora_r: 8
|
49 |
+
lora_alpha: 16
|
50 |
+
lora_dropout: 0.05
|
51 |
+
lora_target_modules: ['q_proj', 'v_proj']
|
52 |
+
train_on_inputs: True
|
53 |
+
add_eos_token: False
|
54 |
+
group_by_length: False
|
55 |
+
wandb_project:
|
56 |
+
wandb_run_name:
|
57 |
+
wandb_watch:
|
58 |
+
wandb_log_model:
|
59 |
+
resume_from_checkpoint: False
|
60 |
+
prompt template: alpaca
|
61 |
|
62 |
+
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 33/33 [01:19<00:00, 2.42s/it]
|
63 |
+
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 33/33 [01:19<00:00, 2.42s/it]
|
64 |
+
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
|
65 |
+
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
|
66 |
+
The class this function is called from is 'LlamaTokenizer'.
|
67 |
+
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
|
68 |
+
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
|
69 |
+
The class this function is called from is 'LlamaTokenizer'.
|
70 |
+
Found cached dataset spider (/home/chrisdono/.cache/huggingface/datasets/spider/spider/1.0.0/4e5143d825a3895451569c8b9b55432b91a4bc2d04d390376c950837f4680daa)
|
71 |
+
0%| | 0/2 [00:00<?, ?it/s]
|
72 |
+
Found cached dataset spider (/home/chrisdono/.cache/huggingface/datasets/spider/spider/1.0.0/4e5143d825a3895451569c8b9b55432b91a4bc2d04d390376c950837f4680daa)
|
73 |
+
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 113.71it/s]
|
74 |
+
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 56.18it/s]
|
75 |
+
Found cached dataset csv (/home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)
|
76 |
+
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 214.92it/s]
|
77 |
+
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
|
78 |
+
Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cac
|
79 |
+
he-b310cf91933dea79.arrow and /home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-9632dc43aab
|
80 |
+
73df2.arrow
|
81 |
+
Found cached dataset csv (/home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)
|
82 |
+
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 787.74it/s]
|
83 |
+
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
|
84 |
+
Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cac
|
85 |
+
he-b310cf91933dea79.arrow and /home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-9632dc43aab
|
86 |
+
73df2.arrow
|
87 |
+
TRAIN DATA
|
88 |
+
{'Unnamed: 0': 2621, 'db_id': 'inn_1', 'query': 'SELECT decor , avg(basePrice) , min(basePrice) FROM Rooms GROUP BY decor;', 'question': 'What is the average minimum and price of the rooms f
|
89 |
+
or each different decor.', 'query_toks': "['SELECT' 'decor' ',' 'avg' '(' 'basePrice' ')' ',' 'min' '(' 'basePrice'\n ')' 'FROM' 'Rooms' 'GROUP' 'BY' 'decor' ';']", 'query_toks_no_value': "['
|
90 |
+
select' 'decor' ',' 'avg' '(' 'baseprice' ')' ',' 'min' '(' 'baseprice'\n ')' 'from' 'rooms' 'group' 'by' 'decor']", 'question_toks': "['What' 'is' 'the' 'average' 'minimum' 'and' 'price' 'of
|
91 |
+
' 'the' 'rooms'\n 'for' 'each' 'different' 'decor' '.']", 'db_context': "['room id', 'room name', 'beds', 'bed type', 'max occupancy', 'base price', 'decor', 'code', 'room', 'check in', 'chec
|
92 |
+
k out', 'rate', 'last name', 'first name', 'adults', 'kids']", 'input_ids': [0, 13866, 338, 385, 15278, 393, 16612, 263, 3414, 29892, 3300, 2859, 411, 385, 1881, 393, 8128, 4340, 3030, 29889,
|
93 |
+
14350, 263, 2933, 393, 7128, 2486, 1614, 2167, 278, 2009, 29889, 13, 13, 2277, 29937, 2799, 4080, 29901, 13, 5618, 338, 278, 6588, 9212, 322, 8666, 310, 278, 19600, 363, 1269, 1422, 10200, 2
|
94 |
+
9889, 13, 13, 2277, 29937, 10567, 29901, 13, 1839, 8345, 1178, 742, 525, 8345, 1024, 742, 525, 2580, 29879, 742, 525, 2580, 1134, 742, 525, 3317, 6919, 6906, 742, 525, 3188, 8666, 742, 525, 1
|
95 |
+
9557, 742, 525, 401, 742, 525, 8345, 742, 525, 3198, 297, 742, 525, 3198, 714, 742, 525, 10492, 742, 525, 4230, 1024, 742, 525, 4102, 1024, 742, 525, 328, 499, 29879, 742, 525, 29895, 4841, 2
|
96 |
+
033, 13, 13, 2277, 29937, 13291, 29901, 13, 6404, 10200, 1919, 1029, 29887, 29898, 3188, 13026, 29897, 1919, 29871, 1375, 29898, 3188, 13026, 29897, 3895, 1528, 4835, 15345, 6770, 10200, 2993
|
97 |
+
6, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
|
98 |
+
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
|
99 |
+
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [0, 13866, 338, 385, 15278, 393, 16612, 263, 3414, 29892, 3300, 2859, 411,
|
100 |
+
385, 1881, 393, 8128, 4340, 3030, 29889, 14350, 263, 2933, 393, 7128, 2486, 1614, 2167, 278, 2009, 29889, 13, 13, 2277, 29937, 2799, 4080, 29901, 13, 5618, 338, 278, 6588, 9212, 322, 8666, 3
|
101 |
+
10, 278, 19600, 363, 1269, 1422, 10200, 29889, 13, 13, 2277, 29937, 10567, 29901, 13, 1839, 8345, 1178, 742, 525, 8345, 1024, 742, 525, 2580, 29879, 742, 525, 2580, 1134, 742, 525, 3317, 6919
|
102 |
+
, 6906, 742, 525, 3188, 8666, 742, 525, 19557, 742, 525, 401, 742, 525, 8345, 742, 525, 3198, 297, 742, 525, 3198, 714, 742, 525, 10492, 742, 525, 4230, 1024, 742, 525, 4102, 1024, 742, 525,
|
103 |
+
328, 499, 29879, 742, 525, 29895, 4841, 2033, 13, 13, 2277, 29937, 13291, 29901, 13, 6404, 10200, 1919, 1029, 29887, 29898, 3188, 13026, 29897, 1919, 29871, 1375, 29898, 3188, 13026, 29897, 3
|
104 |
+
895, 1528, 4835, 15345, 6770, 10200, 29936, 0]}
|
105 |
+
TRAIN DATA
|
106 |
+
{'Unnamed: 0': 4767, 'db_id': 'department_store', 'query': 'SELECT product_id FROM Order_Items GROUP BY product_id HAVING count(*) > 3 UNION SELECT product_id FROM Product_Suppliers GROUP B
|
107 |
+
Y product_id HAVING sum(total_amount_purchased) > 80000', 'question': 'Return the ids of all products that were ordered more than three times or supplied more than 80000.', 'query_toks': "[
|
108 |
+
'SELECT' 'product_id' 'FROM' 'Order_Items' 'GROUP' 'BY' 'product_id'\n 'HAVING' 'count' '(' '*' ')' '>' '3' 'UNION' 'SELECT' 'product_id' 'FROM'\n 'Product_Suppliers' 'GROUP' 'BY' 'product_id
|
109 |
+
' 'HAVING' 'sum' '('\n 'total_amount_purchased' ')' '>' '80000']", 'query_toks_no_value': "['select' 'product_id' 'from' 'order_items' 'group' 'by' 'product_id'\n 'having' 'count' '(' '*' ')'
|
110 |
+
'>' 'value' 'union' 'select' 'product_id'\n 'from' 'product_suppliers' 'group' 'by' 'product_id' 'having' 'sum' '('\n 'total_amount_purchased' ')' '>' 'value']", 'question_toks': "['Return'
|
111 |
+
'the' 'ids' 'of' 'all' 'products' 'that' 'were' 'ordered' 'more'\n 'than' 'three' 'times' 'or' 'supplied' 'more' 'than' '80000' '.']", 'db_context': "['address id', 'address details', 'staff
|
112 |
+
id', 'staff gender', 'staff name', 'supplier id', 'supplier name', 'supplier phone', 'department store chain id', 'department store chain name', 'customer id', 'payment method code', 'custome
|
113 |
+
r code', 'customer name', 'customer address', 'customer phone', 'customer email', 'product id', 'product type code', 'product name', 'product price', 'supplier id', 'address id', 'date from',
|
114 |
+
'date to', 'customer id', 'address id', 'date from', 'date to', 'order id', 'customer id', 'order status code', 'order date', 'department store id', 'department store chain id', 'store name'
|
115 |
+
, 'store address', 'store phone', 'store email', 'department id', 'department store id', 'department name', 'order item id', 'order id', 'product id', 'product id', 'supplier id', 'date suppl
|
116 |
+
ied from', 'date supplied to', 'total amount purchased', 'total value purchased', 'staff id', 'department id', 'date assigned from', 'job title code', 'date assigned to']", 'input_ids': [0, 1
|
117 |
+
3866, 338, 385, 15278, 393, 16612, 263, 3414, 29892, 3300, 2859, 411, 385, 1881, 393, 8128, 4340, 3030, 29889, 14350, 263, 2933, 393, 7128, 2486, 1614, 2167, 278, 2009, 29889, 13, 13, 2277, 2
|
118 |
+
9937, 2799, 4080, 29901, 13, 11609, 278, 18999, 310, 599, 9316, 393, 892, 10372, 901, 1135, 2211, 3064, 470, 19056, 901, 1135, 29871, 29947, 29900, 29900, 29900, 29900, 29889, 13, 13, 2277, 2
|
119 |
+
9937, 10567, 29901, 13, 1839, 7328, 1178, 742, 525, 7328, 4902, 742, 525, 303, 3470, 1178, 742, 525, 303, 3470, 23346, 742, 525, 303, 3470, 1024, 742, 525, 19303, 4926, 1178, 742, 525, 19303,
|
120 |
+
4926, 1024, 742, 525, 19303, 4926, 9008, 742, 525, 311, 8076, 3787, 9704, 1178, 742, 525, 311, 8076, 3787, 9704, 1024, 742, 525, 15539, 1178, 742, 525, 27825, 1158, 775, 742, 525, 15539, 775
|
121 |
+
, 742, 525, 15539, 1024, 742, 525, 15539, 3211, 742, 525, 15539, 9008, 742, 525, 15539, 4876, 742, 525, 4704, 1178, 742, 525, 4704, 1134, 775, 742, 525, 4704, 1024, 742, 525, 4704, 8666, 742,
|
122 |
+
525, 19303, 4926, 1178, 742, 525, 7328, 1178, 742, 525, 1256, 515, 742, 525, 1256, 304, 742, 525, 15539, 1178, 742, 525, 7328, 1178, 742, 525, 1256, 515, 742, 525, 1256, 304, 742, 525, 2098,
|
123 |
+
1178, 742, 525, 15539, 1178, 742, 525, 2098, 4660, 775, 742, 525, 2098, 2635, 742, 525, 311, 8076, 3787, 1178, 742, 525, 311, 8076, 3787, 9704, 1178, 742, 525, 8899, 1024, 742, 525, 8899, 32
|
124 |
+
11, 742, 525, 8899, 9008, 742, 525, 8899, 4876, 742, 525, 311, 8076, 1178, 742, 525, 311, 8076, 3787], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
|
125 |
+
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
|
126 |
+
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
|
127 |
+
, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
|
128 |
+
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [0, 13866, 338, 385, 15278, 393, 16612, 263, 3414, 298
|
129 |
+
92, 3300, 2859, 411, 385, 1881, 393, 8128, 4340, 3030, 29889, 14350, 263, 2933, 393, 7128, 2486, 1614, 2167, 278, 2009, 29889, 13, 13, 2277, 29937, 2799, 4080, 29901, 13, 11609, 278, 18999, 3
|
130 |
+
10, 599, 9316, 393, 892, 10372, 901, 1135, 2211, 3064, 470, 19056, 901, 1135, 29871, 29947, 29900, 29900, 29900, 29900, 29889, 13, 13, 2277, 29937, 10567, 29901, 13, 1839, 7328, 1178, 742, 52
|
131 |
+
5, 7328, 4902, 742, 525, 303, 3470, 1178, 742, 525, 303, 3470, 23346, 742, 525, 303, 3470, 1024, 742, 525, 19303, 4926, 1178, 742, 525, 19303, 4926, 1024, 742, 525, 19303, 4926, 9008, 742, 52
|
132 |
+
5, 311, 8076, 3787, 9704, 1178, 742, 525, 311, 8076, 3787, 9704, 1024, 742, 525, 15539, 1178, 742, 525, 27825, 1158, 775, 742, 525, 15539, 775, 742, 525, 15539, 1024, 742, 525, 15539, 3211, 7
|
133 |
+
42, 525, 15539, 9008, 742, 525, 15539, 4876, 742, 525, 4704, 1178, 742, 525, 4704, 1134, 775, 742, 525, 4704, 1024, 742, 525, 4704, 8666, 742, 525, 19303, 4926, 1178, 742, 525, 7328, 1178, 74
|
134 |
+
2, 525, 1256, 515, 742, 525, 1256, 304, 742, 525, 15539, 1178, 742, 525, 7328, 1178, 742, 525, 1256, 515, 742, 525, 1256, 304, 742, 525, 2098, 1178, 742, 525, 15539, 1178, 742, 525, 2098, 466
|
135 |
+
0, 775, 742, 525, 2098, 2635, 742, 525, 311, 8076, 3787, 1178, 742, 525, 311, 8076, 3787, 9704, 1178, 742, 525, 8899, 1024, 742, 525, 8899, 3211, 742, 525, 8899, 9008, 742, 525, 8899, 4876, 7
|
136 |
+
42, 525, 311, 8076, 1178, 742, 525, 311, 8076, 3787]}
|
137 |
+
{'loss': 2.2228, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.06}
|
138 |
+
{'loss': 2.185, 'learning_rate': 1.8e-05, 'epoch': 0.13}
|
139 |
+
{'loss': 2.1452, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.19}
|
140 |
+
{'loss': 2.0232, 'learning_rate': 3.8e-05, 'epoch': 0.25}
|
141 |
+
{'loss': 1.884, 'learning_rate': 4.8e-05, 'epoch': 0.32}
|
142 |
+
{'loss': 1.62, 'learning_rate': 5.6000000000000006e-05, 'epoch': 0.38}
|
143 |
+
{'loss': 1.3664, 'learning_rate': 6.6e-05, 'epoch': 0.45}
|
144 |
+
{'loss': 1.2159, 'learning_rate': 7.6e-05, 'epoch': 0.51}
|
145 |
+
{'loss': 1.1656, 'learning_rate': 8.6e-05, 'epoch': 0.57}
|
146 |
+
{'loss': 1.0664, 'learning_rate': 9.6e-05, 'epoch': 0.64}
|
147 |
+
{'loss': 1.0253, 'learning_rate': 9.838274932614556e-05, 'epoch': 0.7}
|
148 |
+
{'loss': 0.9716, 'learning_rate': 9.568733153638815e-05, 'epoch': 0.76}
|
149 |
+
{'loss': 0.9162, 'learning_rate': 9.299191374663073e-05, 'epoch': 0.83}
|
150 |
+
{'loss': 0.8849, 'learning_rate': 9.029649595687331e-05, 'epoch': 0.89}
|
151 |
+
{'loss': 0.8648, 'learning_rate': 8.76010781671159e-05, 'epoch': 0.96}
|
152 |
+
{'loss': 0.8077, 'learning_rate': 8.49056603773585e-05, 'epoch': 1.02}
|
153 |
+
{'loss': 0.7443, 'learning_rate': 8.221024258760108e-05, 'epoch': 1.08}
|
154 |
+
{'loss': 0.7253, 'learning_rate': 7.951482479784367e-05, 'epoch': 1.15}
|
155 |
+
{'loss': 0.6845, 'learning_rate': 7.681940700808625e-05, 'epoch': 1.21}
|
156 |
+
{'loss': 0.6956, 'learning_rate': 7.412398921832885e-05, 'epoch': 1.27}
|
157 |
+
{'eval_loss': 0.6555210947990417, 'eval_runtime': 179.8763, 'eval_samples_per_second': 11.119, 'eval_steps_per_second': 0.695, 'epoch': 1.27}
|
158 |
+
{'loss': 0.6293, 'learning_rate': 7.142857142857143e-05, 'epoch': 1.34}
|
159 |
+
{'loss': 0.5948, 'learning_rate': 6.873315363881401e-05, 'epoch': 1.4}
|
160 |
+
{'loss': 0.5306, 'learning_rate': 6.60377358490566e-05, 'epoch': 1.46}
|
161 |
+
{'loss': 0.5607, 'learning_rate': 6.33423180592992e-05, 'epoch': 1.53}
|
162 |
+
{'loss': 0.5095, 'learning_rate': 6.0646900269541785e-05, 'epoch': 1.59}
|
163 |
+
{'loss': 0.4947, 'learning_rate': 5.795148247978437e-05, 'epoch': 1.66}
|
164 |
+
{'loss': 0.4856, 'learning_rate': 5.525606469002696e-05, 'epoch': 1.72}
|
165 |
+
{'loss': 0.4878, 'learning_rate': 5.2560646900269536e-05, 'epoch': 1.78}
|
166 |
+
{'loss': 0.4496, 'learning_rate': 4.986522911051213e-05, 'epoch': 1.85}
|
167 |
+
{'loss': 0.4544, 'learning_rate': 4.716981132075472e-05, 'epoch': 1.91}
|
168 |
+
{'loss': 0.4542, 'learning_rate': 4.447439353099731e-05, 'epoch': 1.97}
|
169 |
+
{'loss': 0.4556, 'learning_rate': 4.1778975741239893e-05, 'epoch': 2.04}
|
170 |
+
{'loss': 0.4014, 'learning_rate': 3.908355795148248e-05, 'epoch': 2.1}
|
171 |
+
{'loss': 0.3893, 'learning_rate': 3.638814016172507e-05, 'epoch': 2.17}
|
172 |
+
{'loss': 0.4197, 'learning_rate': 3.369272237196766e-05, 'epoch': 2.23}
|
173 |
+
{'loss': 0.3942, 'learning_rate': 3.0997304582210244e-05, 'epoch': 2.29}
|
174 |
+
{'loss': 0.3967, 'learning_rate': 2.830188679245283e-05, 'epoch': 2.36}
|
175 |
+
{'loss': 0.3848, 'learning_rate': 2.5606469002695423e-05, 'epoch': 2.42}
|
176 |
+
{'loss': 0.3834, 'learning_rate': 2.2911051212938006e-05, 'epoch': 2.48}
|
177 |
+
{'loss': 0.3647, 'learning_rate': 2.0215633423180595e-05, 'epoch': 2.55}
|
178 |
+
{'eval_loss': 0.3913075923919678, 'eval_runtime': 179.5793, 'eval_samples_per_second': 11.137, 'eval_steps_per_second': 0.696, 'epoch': 2.55}
|
179 |
+
{'loss': 0.3703, 'learning_rate': 1.752021563342318e-05, 'epoch': 2.61}
|
180 |
+
{'loss': 0.3776, 'learning_rate': 1.4824797843665769e-05, 'epoch': 2.68}
|
181 |
+
{'loss': 0.3509, 'learning_rate': 1.2129380053908356e-05, 'epoch': 2.74}
|
182 |
+
{'loss': 0.3622, 'learning_rate': 9.433962264150944e-06, 'epoch': 2.8}
|
183 |
+
{'loss': 0.351, 'learning_rate': 6.738544474393531e-06, 'epoch': 2.87}
|
184 |
+
{'loss': 0.351, 'learning_rate': 6.738544474393531e-06, 'epoch': 2.87}
|
185 |
+
{'loss': 0.3497, 'learning_rate': 4.0431266846361185e-06, 'epoch': 2.93}
|
186 |
+
{'loss': 0.369, 'learning_rate': 1.3477088948787064e-06, 'epoch': 2.99}
|
187 |
+
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 471/471 [1:25:47<00:00, 8.04s/it]
|
188 |
+
{'train_runtime': 5148.4044, 'train_samples_per_second': 2.914, 'train_steps_per_second': 0.091, 'train_loss': 0.7860396517057074, 'epoch': 3.0}
|
189 |
+
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 471/471 [1:25:47<00:00, 10.93s/it]
|
190 |
|
|
|
191 |
|