sqllama
/

lora-spider-dono

Model card Files Files and versions Community

chrisdono commited on May 1, 2023

Commit

5dcd322

•

1 Parent(s): 724811b

added terminal log to README

Browse files

Files changed (1) hide show

README.md +182 -5

README.md CHANGED Viewed

@@ -2,13 +2,190 @@
 For this model, a VM with 2 T4 GPUs was used.
-To get the training to work on the 2 GPUs (utilize both GPUS simultaneously), the following command was used to initiate training.
-Note 1. Micro batch size was increased from the default 4 to 16.  Note that increasing it further is possible based on other training that has been performed.  This was a first attempt.
-Note 2. Output directory was initially lora-alpaca and then contents were moved to new folder when initializing git repository.
-## Log

 For this model, a VM with 2 T4 GPUs was used.
+Note 1. Output directory was initially lora-alpaca and then contents were moved to new folder when initializing git repository.
+## Log
+(sqltest) chrisdono@deep-learning-duo-t4-3:~/alpaca-lora$ WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/llam
+a-7b-hf' --data_path 'spider' --output_dir './lora-alpaca' --num_epochs 3 --batch_size 32 --micro_batch_size 16 --learning_rate '1e-4'
+WARNING:torch.distributed.run:
+*****************************************
+Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your appli
+cation as needed.
+*****************************************
+===================================BUG REPORT===================================
+Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
+================================================================================
+===================================BUG REPORT===================================
+Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
+================================================================================
+/opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path
+s...
+  warn(msg)
+/opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path
+s...
+  warn(msg)
+CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
+CUDA SETUP: Highest compute capability among GPUs detected: 7.5
+CUDA SETUP: Detected CUDA version 113
+CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
+CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
+CUDA SETUP: Highest compute capability among GPUs detected: 7.5
+CUDA SETUP: Detected CUDA version 113
+CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
+Training Alpaca-LoRA model with params:
+base_model: decapoda-research/llama-7b-hf
+data_path: spider
+output_dir: ./lora-alpaca
+batch_size: 32
+micro_batch_size: 16
+num_epochs: 3
+learning_rate: 0.0001
+cutoff_len: 256
+val_set_size: 2000
+lora_r: 8
+lora_alpha: 16
+lora_dropout: 0.05
+lora_target_modules: ['q_proj', 'v_proj']
+train_on_inputs: True
+add_eos_token: False
+group_by_length: False
+wandb_project:
+wandb_run_name:
+wandb_watch:
+wandb_log_model:
+resume_from_checkpoint: False
+prompt template: alpaca
+Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:19<00:00,  2.42s/it]
+Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:19<00:00,  2.42s/it]
+The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
+The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
+The class this function is called from is 'LlamaTokenizer'.
+The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
+The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
+The class this function is called from is 'LlamaTokenizer'.
+Found cached dataset spider (/home/chrisdono/.cache/huggingface/datasets/spider/spider/1.0.0/4e5143d825a3895451569c8b9b55432b91a4bc2d04d390376c950837f4680daa)
+  0%|                                                                                                                                                                    | 0/2 [00:00<?, ?it/s]
+Found cached dataset spider (/home/chrisdono/.cache/huggingface/datasets/spider/spider/1.0.0/4e5143d825a3895451569c8b9b55432b91a4bc2d04d390376c950837f4680daa)
+100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.71it/s]
+100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 56.18it/s]
+Found cached dataset csv (/home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)
+100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 214.92it/s]
+trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
+Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cac
+he-b310cf91933dea79.arrow and /home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-9632dc43aab
+73df2.arrow
+Found cached dataset csv (/home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)
+100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 787.74it/s]
+trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
+Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cac
+he-b310cf91933dea79.arrow and /home/chrisdono/.cache/huggingface/datasets/csv/default-68889607ac077205/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-9632dc43aab
+73df2.arrow
+TRAIN DATA
+{'Unnamed: 0': 2621, 'db_id': 'inn_1', 'query': 'SELECT decor , avg(basePrice) ,  min(basePrice) FROM Rooms GROUP BY decor;', 'question': 'What is the average minimum and price of the rooms f
+or each different decor.', 'query_toks': "['SELECT' 'decor' ',' 'avg' '(' 'basePrice' ')' ',' 'min' '(' 'basePrice'\n ')' 'FROM' 'Rooms' 'GROUP' 'BY' 'decor' ';']", 'query_toks_no_value': "['
+select' 'decor' ',' 'avg' '(' 'baseprice' ')' ',' 'min' '(' 'baseprice'\n ')' 'from' 'rooms' 'group' 'by' 'decor']", 'question_toks': "['What' 'is' 'the' 'average' 'minimum' 'and' 'price' 'of
+' 'the' 'rooms'\n 'for' 'each' 'different' 'decor' '.']", 'db_context': "['room id', 'room name', 'beds', 'bed type', 'max occupancy', 'base price', 'decor', 'code', 'room', 'check in', 'chec
+k out', 'rate', 'last name', 'first name', 'adults', 'kids']", 'input_ids': [0, 13866, 338, 385, 15278, 393, 16612, 263, 3414, 29892, 3300, 2859, 411, 385, 1881, 393, 8128, 4340, 3030, 29889,
+ 14350, 263, 2933, 393, 7128, 2486, 1614, 2167, 278, 2009, 29889, 13, 13, 2277, 29937, 2799, 4080, 29901, 13, 5618, 338, 278, 6588, 9212, 322, 8666, 310, 278, 19600, 363, 1269, 1422, 10200, 2
+9889, 13, 13, 2277, 29937, 10567, 29901, 13, 1839, 8345, 1178, 742, 525, 8345, 1024, 742, 525, 2580, 29879, 742, 525, 2580, 1134, 742, 525, 3317, 6919, 6906, 742, 525, 3188, 8666, 742, 525, 1
+9557, 742, 525, 401, 742, 525, 8345, 742, 525, 3198, 297, 742, 525, 3198, 714, 742, 525, 10492, 742, 525, 4230, 1024, 742, 525, 4102, 1024, 742, 525, 328, 499, 29879, 742, 525, 29895, 4841, 2
+033, 13, 13, 2277, 29937, 13291, 29901, 13, 6404, 10200, 1919, 1029, 29887, 29898, 3188, 13026, 29897, 1919, 29871, 1375, 29898, 3188, 13026, 29897, 3895, 1528, 4835, 15345, 6770, 10200, 2993
+6, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [0, 13866, 338, 385, 15278, 393, 16612, 263, 3414, 29892, 3300, 2859, 411,
+ 385, 1881, 393, 8128, 4340, 3030, 29889, 14350, 263, 2933, 393, 7128, 2486, 1614, 2167, 278, 2009, 29889, 13, 13, 2277, 29937, 2799, 4080, 29901, 13, 5618, 338, 278, 6588, 9212, 322, 8666, 3
+10, 278, 19600, 363, 1269, 1422, 10200, 29889, 13, 13, 2277, 29937, 10567, 29901, 13, 1839, 8345, 1178, 742, 525, 8345, 1024, 742, 525, 2580, 29879, 742, 525, 2580, 1134, 742, 525, 3317, 6919
+, 6906, 742, 525, 3188, 8666, 742, 525, 19557, 742, 525, 401, 742, 525, 8345, 742, 525, 3198, 297, 742, 525, 3198, 714, 742, 525, 10492, 742, 525, 4230, 1024, 742, 525, 4102, 1024, 742, 525,
+328, 499, 29879, 742, 525, 29895, 4841, 2033, 13, 13, 2277, 29937, 13291, 29901, 13, 6404, 10200, 1919, 1029, 29887, 29898, 3188, 13026, 29897, 1919, 29871, 1375, 29898, 3188, 13026, 29897, 3
+895, 1528, 4835, 15345, 6770, 10200, 29936, 0]}
+TRAIN DATA
+{'Unnamed: 0': 4767, 'db_id': 'department_store', 'query': 'SELECT product_id FROM Order_Items GROUP BY product_id HAVING count(*)  >  3 UNION SELECT product_id FROM Product_Suppliers GROUP B
+Y product_id HAVING sum(total_amount_purchased)  >  80000', 'question': 'Return the ids of all products that were ordered more than three times or supplied more than 80000.', 'query_toks': "[
+'SELECT' 'product_id' 'FROM' 'Order_Items' 'GROUP' 'BY' 'product_id'\n 'HAVING' 'count' '(' '*' ')' '>' '3' 'UNION' 'SELECT' 'product_id' 'FROM'\n 'Product_Suppliers' 'GROUP' 'BY' 'product_id
+' 'HAVING' 'sum' '('\n 'total_amount_purchased' ')' '>' '80000']", 'query_toks_no_value': "['select' 'product_id' 'from' 'order_items' 'group' 'by' 'product_id'\n 'having' 'count' '(' '*' ')'
+ '>' 'value' 'union' 'select' 'product_id'\n 'from' 'product_suppliers' 'group' 'by' 'product_id' 'having' 'sum' '('\n 'total_amount_purchased' ')' '>' 'value']", 'question_toks': "['Return'
+'the' 'ids' 'of' 'all' 'products' 'that' 'were' 'ordered' 'more'\n 'than' 'three' 'times' 'or' 'supplied' 'more' 'than' '80000' '.']", 'db_context': "['address id', 'address details', 'staff
+id', 'staff gender', 'staff name', 'supplier id', 'supplier name', 'supplier phone', 'department store chain id', 'department store chain name', 'customer id', 'payment method code', 'custome
+r code', 'customer name', 'customer address', 'customer phone', 'customer email', 'product id', 'product type code', 'product name', 'product price', 'supplier id', 'address id', 'date from',
+ 'date to', 'customer id', 'address id', 'date from', 'date to', 'order id', 'customer id', 'order status code', 'order date', 'department store id', 'department store chain id', 'store name'
+, 'store address', 'store phone', 'store email', 'department id', 'department store id', 'department name', 'order item id', 'order id', 'product id', 'product id', 'supplier id', 'date suppl
+ied from', 'date supplied to', 'total amount purchased', 'total value purchased', 'staff id', 'department id', 'date assigned from', 'job title code', 'date assigned to']", 'input_ids': [0, 1
+3866, 338, 385, 15278, 393, 16612, 263, 3414, 29892, 3300, 2859, 411, 385, 1881, 393, 8128, 4340, 3030, 29889, 14350, 263, 2933, 393, 7128, 2486, 1614, 2167, 278, 2009, 29889, 13, 13, 2277, 2
+9937, 2799, 4080, 29901, 13, 11609, 278, 18999, 310, 599, 9316, 393, 892, 10372, 901, 1135, 2211, 3064, 470, 19056, 901, 1135, 29871, 29947, 29900, 29900, 29900, 29900, 29889, 13, 13, 2277, 2
+9937, 10567, 29901, 13, 1839, 7328, 1178, 742, 525, 7328, 4902, 742, 525, 303, 3470, 1178, 742, 525, 303, 3470, 23346, 742, 525, 303, 3470, 1024, 742, 525, 19303, 4926, 1178, 742, 525, 19303,
+ 4926, 1024, 742, 525, 19303, 4926, 9008, 742, 525, 311, 8076, 3787, 9704, 1178, 742, 525, 311, 8076, 3787, 9704, 1024, 742, 525, 15539, 1178, 742, 525, 27825, 1158, 775, 742, 525, 15539, 775
+, 742, 525, 15539, 1024, 742, 525, 15539, 3211, 742, 525, 15539, 9008, 742, 525, 15539, 4876, 742, 525, 4704, 1178, 742, 525, 4704, 1134, 775, 742, 525, 4704, 1024, 742, 525, 4704, 8666, 742,
+ 525, 19303, 4926, 1178, 742, 525, 7328, 1178, 742, 525, 1256, 515, 742, 525, 1256, 304, 742, 525, 15539, 1178, 742, 525, 7328, 1178, 742, 525, 1256, 515, 742, 525, 1256, 304, 742, 525, 2098,
+ 1178, 742, 525, 15539, 1178, 742, 525, 2098, 4660, 775, 742, 525, 2098, 2635, 742, 525, 311, 8076, 3787, 1178, 742, 525, 311, 8076, 3787, 9704, 1178, 742, 525, 8899, 1024, 742, 525, 8899, 32
+11, 742, 525, 8899, 9008, 742, 525, 8899, 4876, 742, 525, 311, 8076, 1178, 742, 525, 311, 8076, 3787], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
+, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [0, 13866, 338, 385, 15278, 393, 16612, 263, 3414, 298
+92, 3300, 2859, 411, 385, 1881, 393, 8128, 4340, 3030, 29889, 14350, 263, 2933, 393, 7128, 2486, 1614, 2167, 278, 2009, 29889, 13, 13, 2277, 29937, 2799, 4080, 29901, 13, 11609, 278, 18999, 3
+10, 599, 9316, 393, 892, 10372, 901, 1135, 2211, 3064, 470, 19056, 901, 1135, 29871, 29947, 29900, 29900, 29900, 29900, 29889, 13, 13, 2277, 29937, 10567, 29901, 13, 1839, 7328, 1178, 742, 52
+5, 7328, 4902, 742, 525, 303, 3470, 1178, 742, 525, 303, 3470, 23346, 742, 525, 303, 3470, 1024, 742, 525, 19303, 4926, 1178, 742, 525, 19303, 4926, 1024, 742, 525, 19303, 4926, 9008, 742, 52
+5, 311, 8076, 3787, 9704, 1178, 742, 525, 311, 8076, 3787, 9704, 1024, 742, 525, 15539, 1178, 742, 525, 27825, 1158, 775, 742, 525, 15539, 775, 742, 525, 15539, 1024, 742, 525, 15539, 3211, 7
+42, 525, 15539, 9008, 742, 525, 15539, 4876, 742, 525, 4704, 1178, 742, 525, 4704, 1134, 775, 742, 525, 4704, 1024, 742, 525, 4704, 8666, 742, 525, 19303, 4926, 1178, 742, 525, 7328, 1178, 74
+2, 525, 1256, 515, 742, 525, 1256, 304, 742, 525, 15539, 1178, 742, 525, 7328, 1178, 742, 525, 1256, 515, 742, 525, 1256, 304, 742, 525, 2098, 1178, 742, 525, 15539, 1178, 742, 525, 2098, 466
+0, 775, 742, 525, 2098, 2635, 742, 525, 311, 8076, 3787, 1178, 742, 525, 311, 8076, 3787, 9704, 1178, 742, 525, 8899, 1024, 742, 525, 8899, 3211, 742, 525, 8899, 9008, 742, 525, 8899, 4876, 7
+42, 525, 311, 8076, 1178, 742, 525, 311, 8076, 3787]}
+{'loss': 2.2228, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.06}
+{'loss': 2.185, 'learning_rate': 1.8e-05, 'epoch': 0.13}
+{'loss': 2.1452, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.19}
+{'loss': 2.0232, 'learning_rate': 3.8e-05, 'epoch': 0.25}
+{'loss': 1.884, 'learning_rate': 4.8e-05, 'epoch': 0.32}
+{'loss': 1.62, 'learning_rate': 5.6000000000000006e-05, 'epoch': 0.38}
+{'loss': 1.3664, 'learning_rate': 6.6e-05, 'epoch': 0.45}
+{'loss': 1.2159, 'learning_rate': 7.6e-05, 'epoch': 0.51}
+{'loss': 1.1656, 'learning_rate': 8.6e-05, 'epoch': 0.57}
+{'loss': 1.0664, 'learning_rate': 9.6e-05, 'epoch': 0.64}
+{'loss': 1.0253, 'learning_rate': 9.838274932614556e-05, 'epoch': 0.7}
+{'loss': 0.9716, 'learning_rate': 9.568733153638815e-05, 'epoch': 0.76}
+{'loss': 0.9162, 'learning_rate': 9.299191374663073e-05, 'epoch': 0.83}
+{'loss': 0.8849, 'learning_rate': 9.029649595687331e-05, 'epoch': 0.89}
+{'loss': 0.8648, 'learning_rate': 8.76010781671159e-05, 'epoch': 0.96}
+{'loss': 0.8077, 'learning_rate': 8.49056603773585e-05, 'epoch': 1.02}
+{'loss': 0.7443, 'learning_rate': 8.221024258760108e-05, 'epoch': 1.08}
+{'loss': 0.7253, 'learning_rate': 7.951482479784367e-05, 'epoch': 1.15}
+{'loss': 0.6845, 'learning_rate': 7.681940700808625e-05, 'epoch': 1.21}
+{'loss': 0.6956, 'learning_rate': 7.412398921832885e-05, 'epoch': 1.27}
+{'eval_loss': 0.6555210947990417, 'eval_runtime': 179.8763, 'eval_samples_per_second': 11.119, 'eval_steps_per_second': 0.695, 'epoch': 1.27}
+{'loss': 0.6293, 'learning_rate': 7.142857142857143e-05, 'epoch': 1.34}
+{'loss': 0.5948, 'learning_rate': 6.873315363881401e-05, 'epoch': 1.4}
+{'loss': 0.5306, 'learning_rate': 6.60377358490566e-05, 'epoch': 1.46}
+{'loss': 0.5607, 'learning_rate': 6.33423180592992e-05, 'epoch': 1.53}
+{'loss': 0.5095, 'learning_rate': 6.0646900269541785e-05, 'epoch': 1.59}
+{'loss': 0.4947, 'learning_rate': 5.795148247978437e-05, 'epoch': 1.66}
+{'loss': 0.4856, 'learning_rate': 5.525606469002696e-05, 'epoch': 1.72}
+{'loss': 0.4878, 'learning_rate': 5.2560646900269536e-05, 'epoch': 1.78}
+{'loss': 0.4496, 'learning_rate': 4.986522911051213e-05, 'epoch': 1.85}
+{'loss': 0.4544, 'learning_rate': 4.716981132075472e-05, 'epoch': 1.91}
+{'loss': 0.4542, 'learning_rate': 4.447439353099731e-05, 'epoch': 1.97}
+{'loss': 0.4556, 'learning_rate': 4.1778975741239893e-05, 'epoch': 2.04}
+{'loss': 0.4014, 'learning_rate': 3.908355795148248e-05, 'epoch': 2.1}
+{'loss': 0.3893, 'learning_rate': 3.638814016172507e-05, 'epoch': 2.17}
+{'loss': 0.4197, 'learning_rate': 3.369272237196766e-05, 'epoch': 2.23}
+{'loss': 0.3942, 'learning_rate': 3.0997304582210244e-05, 'epoch': 2.29}
+{'loss': 0.3967, 'learning_rate': 2.830188679245283e-05, 'epoch': 2.36}
+{'loss': 0.3848, 'learning_rate': 2.5606469002695423e-05, 'epoch': 2.42}
+{'loss': 0.3834, 'learning_rate': 2.2911051212938006e-05, 'epoch': 2.48}
+{'loss': 0.3647, 'learning_rate': 2.0215633423180595e-05, 'epoch': 2.55}
+{'eval_loss': 0.3913075923919678, 'eval_runtime': 179.5793, 'eval_samples_per_second': 11.137, 'eval_steps_per_second': 0.696, 'epoch': 2.55}
+{'loss': 0.3703, 'learning_rate': 1.752021563342318e-05, 'epoch': 2.61}
+{'loss': 0.3776, 'learning_rate': 1.4824797843665769e-05, 'epoch': 2.68}
+{'loss': 0.3509, 'learning_rate': 1.2129380053908356e-05, 'epoch': 2.74}
+{'loss': 0.3622, 'learning_rate': 9.433962264150944e-06, 'epoch': 2.8}
+{'loss': 0.351, 'learning_rate': 6.738544474393531e-06, 'epoch': 2.87}
+{'loss': 0.351, 'learning_rate': 6.738544474393531e-06, 'epoch': 2.87}
+{'loss': 0.3497, 'learning_rate': 4.0431266846361185e-06, 'epoch': 2.93}
+{'loss': 0.369, 'learning_rate': 1.3477088948787064e-06, 'epoch': 2.99}
+100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 471/471 [1:25:47<00:00,  8.04s/it]
+{'train_runtime': 5148.4044, 'train_samples_per_second': 2.914, 'train_steps_per_second': 0.091, 'train_loss': 0.7860396517057074, 'epoch': 3.0}
+100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 471/471 [1:25:47<00:00, 10.93s/it]