diff --git "a/checkpoints/main.log" "b/checkpoints/main.log" new file mode 100644--- /dev/null +++ "b/checkpoints/main.log" @@ -0,0 +1,891 @@ +[2024-09-26 05:19:51,718][Main][INFO] - Distributed environment: DistributedType.NO +Num processes: 1 +Process index: 0 +Local process index: 0 +Device: cuda + +Mixed precision type: bf16 + +[2024-09-26 05:19:51,719][Main][INFO] - Working directory is /workspace/nanoT5/outputs/2024-09-26/05-19-51 +[2024-09-26 05:27:22,340][Main][INFO] - [train] Step 25 out of 20000 | Loss --> 4.134 | Grad_l2 --> 1.536 | Weights_l2 --> 10862.729 | Lr --> 0.005 | Seconds_per_step --> 15.656 | +[2024-09-26 05:30:00,911][Main][INFO] - [train] Step 50 out of 20000 | Loss --> 2.720 | Grad_l2 --> 1.020 | Weights_l2 --> 10862.560 | Lr --> 0.005 | Seconds_per_step --> 6.343 | +[2024-09-26 05:32:39,556][Main][INFO] - [train] Step 75 out of 20000 | Loss --> 2.177 | Grad_l2 --> 0.556 | Weights_l2 --> 10862.480 | Lr --> 0.005 | Seconds_per_step --> 6.346 | +[2024-09-26 05:35:20,390][Main][INFO] - [train] Step 100 out of 20000 | Loss --> 2.053 | Grad_l2 --> 0.431 | Weights_l2 --> 10862.439 | Lr --> 0.005 | Seconds_per_step --> 6.433 | +[2024-09-26 05:37:59,248][Main][INFO] - [train] Step 125 out of 20000 | Loss --> 2.023 | Grad_l2 --> 0.391 | Weights_l2 --> 10862.419 | Lr --> 0.005 | Seconds_per_step --> 6.354 | +[2024-09-26 05:40:38,045][Main][INFO] - [train] Step 150 out of 20000 | Loss --> 1.989 | Grad_l2 --> 0.379 | Weights_l2 --> 10862.409 | Lr --> 0.005 | Seconds_per_step --> 6.352 | +[2024-09-26 05:43:16,853][Main][INFO] - [train] Step 175 out of 20000 | Loss --> 1.987 | Grad_l2 --> 0.345 | Weights_l2 --> 10862.419 | Lr --> 0.005 | Seconds_per_step --> 6.352 | +[2024-09-26 05:45:55,712][Main][INFO] - [train] Step 200 out of 20000 | Loss --> 1.955 | Grad_l2 --> 0.366 | Weights_l2 --> 10862.431 | Lr --> 0.005 | Seconds_per_step --> 6.354 | +[2024-09-26 05:48:36,149][Main][INFO] - [train] Step 225 out of 20000 | Loss --> 1.966 | Grad_l2 --> 0.325 | Weights_l2 --> 10862.469 | Lr --> 0.005 | Seconds_per_step --> 6.417 | +[2024-09-26 05:51:14,966][Main][INFO] - [train] Step 250 out of 20000 | Loss --> 1.961 | Grad_l2 --> 0.303 | Weights_l2 --> 10862.504 | Lr --> 0.005 | Seconds_per_step --> 6.353 | +[2024-09-26 05:53:53,810][Main][INFO] - [train] Step 275 out of 20000 | Loss --> 1.951 | Grad_l2 --> 0.296 | Weights_l2 --> 10862.547 | Lr --> 0.005 | Seconds_per_step --> 6.354 | +[2024-09-26 05:56:32,608][Main][INFO] - [train] Step 300 out of 20000 | Loss --> 1.938 | Grad_l2 --> 0.304 | Weights_l2 --> 10862.600 | Lr --> 0.005 | Seconds_per_step --> 6.352 | +[2024-09-26 05:59:13,160][Main][INFO] - [train] Step 325 out of 20000 | Loss --> 1.946 | Grad_l2 --> 0.305 | Weights_l2 --> 10862.669 | Lr --> 0.005 | Seconds_per_step --> 6.422 | +[2024-09-26 06:01:52,050][Main][INFO] - [train] Step 350 out of 20000 | Loss --> 1.936 | Grad_l2 --> 0.301 | Weights_l2 --> 10862.748 | Lr --> 0.005 | Seconds_per_step --> 6.356 | +[2024-09-26 06:04:31,166][Main][INFO] - [train] Step 375 out of 20000 | Loss --> 1.930 | Grad_l2 --> 0.366 | Weights_l2 --> 10862.834 | Lr --> 0.005 | Seconds_per_step --> 6.365 | +[2024-09-26 06:07:09,947][Main][INFO] - [train] Step 400 out of 20000 | Loss --> 1.946 | Grad_l2 --> 0.308 | Weights_l2 --> 10862.926 | Lr --> 0.005 | Seconds_per_step --> 6.351 | +[2024-09-26 06:09:50,377][Main][INFO] - [train] Step 425 out of 20000 | Loss --> 1.928 | Grad_l2 --> 0.297 | Weights_l2 --> 10863.024 | Lr --> 0.005 | Seconds_per_step --> 6.417 | +[2024-09-26 06:12:29,254][Main][INFO] - [train] Step 450 out of 20000 | Loss --> 1.928 | Grad_l2 --> 0.304 | Weights_l2 --> 10863.135 | Lr --> 0.005 | Seconds_per_step --> 6.355 | +[2024-09-26 06:15:07,929][Main][INFO] - [train] Step 475 out of 20000 | Loss --> 1.929 | Grad_l2 --> 0.343 | Weights_l2 --> 10863.254 | Lr --> 0.005 | Seconds_per_step --> 6.347 | +[2024-09-26 06:17:46,772][Main][INFO] - [train] Step 500 out of 20000 | Loss --> 1.932 | Grad_l2 --> 0.323 | Weights_l2 --> 10863.372 | Lr --> 0.005 | Seconds_per_step --> 6.354 | +[2024-09-26 06:20:27,307][Main][INFO] - [train] Step 525 out of 20000 | Loss --> 1.929 | Grad_l2 --> 0.318 | Weights_l2 --> 10863.506 | Lr --> 0.006 | Seconds_per_step --> 6.421 | +[2024-09-26 06:23:06,037][Main][INFO] - [train] Step 550 out of 20000 | Loss --> 1.918 | Grad_l2 --> 0.325 | Weights_l2 --> 10863.635 | Lr --> 0.006 | Seconds_per_step --> 6.349 | +[2024-09-26 06:25:44,794][Main][INFO] - [train] Step 575 out of 20000 | Loss --> 1.922 | Grad_l2 --> 0.324 | Weights_l2 --> 10863.773 | Lr --> 0.006 | Seconds_per_step --> 6.350 | +[2024-09-26 06:28:23,560][Main][INFO] - [train] Step 600 out of 20000 | Loss --> 1.930 | Grad_l2 --> 0.304 | Weights_l2 --> 10863.918 | Lr --> 0.006 | Seconds_per_step --> 6.351 | +[2024-09-26 06:31:03,967][Main][INFO] - [train] Step 625 out of 20000 | Loss --> 1.928 | Grad_l2 --> 0.325 | Weights_l2 --> 10864.057 | Lr --> 0.006 | Seconds_per_step --> 6.416 | +[2024-09-26 06:33:42,806][Main][INFO] - [train] Step 650 out of 20000 | Loss --> 1.929 | Grad_l2 --> 0.306 | Weights_l2 --> 10864.205 | Lr --> 0.006 | Seconds_per_step --> 6.353 | +[2024-09-26 06:36:21,760][Main][INFO] - [train] Step 675 out of 20000 | Loss --> 1.920 | Grad_l2 --> 0.306 | Weights_l2 --> 10864.363 | Lr --> 0.006 | Seconds_per_step --> 6.358 | +[2024-09-26 06:39:00,624][Main][INFO] - [train] Step 700 out of 20000 | Loss --> 1.926 | Grad_l2 --> 0.332 | Weights_l2 --> 10864.529 | Lr --> 0.006 | Seconds_per_step --> 6.354 | +[2024-09-26 06:41:39,400][Main][INFO] - [train] Step 725 out of 20000 | Loss --> 1.925 | Grad_l2 --> 0.348 | Weights_l2 --> 10864.713 | Lr --> 0.006 | Seconds_per_step --> 6.351 | +[2024-09-26 06:44:19,655][Main][INFO] - [train] Step 750 out of 20000 | Loss --> 1.923 | Grad_l2 --> 0.335 | Weights_l2 --> 10864.900 | Lr --> 0.006 | Seconds_per_step --> 6.410 | +[2024-09-26 06:46:58,432][Main][INFO] - [train] Step 775 out of 20000 | Loss --> 1.924 | Grad_l2 --> 0.294 | Weights_l2 --> 10865.080 | Lr --> 0.006 | Seconds_per_step --> 6.351 | +[2024-09-26 06:49:37,388][Main][INFO] - [train] Step 800 out of 20000 | Loss --> 1.930 | Grad_l2 --> 0.338 | Weights_l2 --> 10865.277 | Lr --> 0.006 | Seconds_per_step --> 6.358 | +[2024-09-26 06:52:16,424][Main][INFO] - [train] Step 825 out of 20000 | Loss --> 1.933 | Grad_l2 --> 0.330 | Weights_l2 --> 10865.480 | Lr --> 0.006 | Seconds_per_step --> 6.361 | +[2024-09-26 06:54:57,348][Main][INFO] - [train] Step 850 out of 20000 | Loss --> 1.927 | Grad_l2 --> 0.355 | Weights_l2 --> 10865.685 | Lr --> 0.006 | Seconds_per_step --> 6.437 | +[2024-09-26 06:57:36,473][Main][INFO] - [train] Step 875 out of 20000 | Loss --> 1.920 | Grad_l2 --> 0.319 | Weights_l2 --> 10865.898 | Lr --> 0.006 | Seconds_per_step --> 6.365 | +[2024-09-26 07:00:15,656][Main][INFO] - [train] Step 900 out of 20000 | Loss --> 1.923 | Grad_l2 --> 0.306 | Weights_l2 --> 10866.107 | Lr --> 0.006 | Seconds_per_step --> 6.367 | +[2024-09-26 07:02:54,612][Main][INFO] - [train] Step 925 out of 20000 | Loss --> 1.933 | Grad_l2 --> 0.310 | Weights_l2 --> 10866.321 | Lr --> 0.006 | Seconds_per_step --> 6.358 | +[2024-09-26 07:05:34,896][Main][INFO] - [train] Step 950 out of 20000 | Loss --> 1.927 | Grad_l2 --> 0.341 | Weights_l2 --> 10866.555 | Lr --> 0.006 | Seconds_per_step --> 6.411 | +[2024-09-26 07:08:13,660][Main][INFO] - [train] Step 975 out of 20000 | Loss --> 1.936 | Grad_l2 --> 0.300 | Weights_l2 --> 10866.786 | Lr --> 0.006 | Seconds_per_step --> 6.350 | +[2024-09-26 07:10:52,641][Main][INFO] - [train] Step 1000 out of 20000 | Loss --> 1.922 | Grad_l2 --> 0.304 | Weights_l2 --> 10867.026 | Lr --> 0.006 | Seconds_per_step --> 6.359 | +[2024-09-26 07:13:31,460][Main][INFO] - [train] Step 1025 out of 20000 | Loss --> 1.928 | Grad_l2 --> 0.345 | Weights_l2 --> 10867.259 | Lr --> 0.006 | Seconds_per_step --> 6.353 | +[2024-09-26 07:16:10,306][Main][INFO] - [train] Step 1050 out of 20000 | Loss --> 1.922 | Grad_l2 --> 0.315 | Weights_l2 --> 10867.517 | Lr --> 0.006 | Seconds_per_step --> 6.354 | +[2024-09-26 07:18:51,027][Main][INFO] - [train] Step 1075 out of 20000 | Loss --> 1.921 | Grad_l2 --> 0.388 | Weights_l2 --> 10867.773 | Lr --> 0.006 | Seconds_per_step --> 6.429 | +[2024-09-26 07:21:29,834][Main][INFO] - [train] Step 1100 out of 20000 | Loss --> 1.927 | Grad_l2 --> 0.338 | Weights_l2 --> 10868.042 | Lr --> 0.006 | Seconds_per_step --> 6.352 | +[2024-09-26 07:24:08,412][Main][INFO] - [train] Step 1125 out of 20000 | Loss --> 1.939 | Grad_l2 --> 0.348 | Weights_l2 --> 10868.305 | Lr --> 0.006 | Seconds_per_step --> 6.343 | +[2024-09-26 07:26:47,337][Main][INFO] - [train] Step 1150 out of 20000 | Loss --> 1.918 | Grad_l2 --> 0.292 | Weights_l2 --> 10868.565 | Lr --> 0.006 | Seconds_per_step --> 6.357 | +[2024-09-26 07:29:27,835][Main][INFO] - [train] Step 1175 out of 20000 | Loss --> 1.928 | Grad_l2 --> 0.328 | Weights_l2 --> 10868.835 | Lr --> 0.006 | Seconds_per_step --> 6.420 | +[2024-09-26 07:32:06,718][Main][INFO] - [train] Step 1200 out of 20000 | Loss --> 1.922 | Grad_l2 --> 0.291 | Weights_l2 --> 10869.121 | Lr --> 0.006 | Seconds_per_step --> 6.355 | +[2024-09-26 07:34:45,777][Main][INFO] - [train] Step 1225 out of 20000 | Loss --> 1.932 | Grad_l2 --> 0.365 | Weights_l2 --> 10869.410 | Lr --> 0.006 | Seconds_per_step --> 6.362 | +[2024-09-26 07:37:24,754][Main][INFO] - [train] Step 1250 out of 20000 | Loss --> 1.926 | Grad_l2 --> 0.297 | Weights_l2 --> 10869.697 | Lr --> 0.006 | Seconds_per_step --> 6.359 | +[2024-09-26 07:40:05,305][Main][INFO] - [train] Step 1275 out of 20000 | Loss --> 1.936 | Grad_l2 --> 0.339 | Weights_l2 --> 10869.983 | Lr --> 0.006 | Seconds_per_step --> 6.422 | +[2024-09-26 07:42:44,642][Main][INFO] - [train] Step 1300 out of 20000 | Loss --> 1.922 | Grad_l2 --> 0.294 | Weights_l2 --> 10870.280 | Lr --> 0.006 | Seconds_per_step --> 6.373 | +[2024-09-26 07:44:04,429][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceTB/smollm-corpus/resolve/3ba9d605774198c5868892d7a8deda78031a781f/fineweb-edu-dedup/train-00103-of-00234.parquet (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: 49397c11-3235-4ca1-8215-8cc7ecb9c7cd)')' thrown while requesting GET https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus/resolve/3ba9d605774198c5868892d7a8deda78031a781f/fineweb-edu-dedup/train-00103-of-00234.parquet +[2024-09-26 07:44:04,449][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. +[2024-09-26 07:45:42,098][Main][INFO] - [train] Step 1325 out of 20000 | Loss --> 1.918 | Grad_l2 --> 0.322 | Weights_l2 --> 10870.587 | Lr --> 0.006 | Seconds_per_step --> 7.098 | +[2024-09-26 07:46:13,305][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /datasets/HuggingFaceTB/smollm-corpus/resolve/3ba9d605774198c5868892d7a8deda78031a781f/fineweb-edu-dedup/train-00099-of-00234.parquet (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: bd8e6877-7d6d-40af-87fb-0dae0f16cdb6)')' thrown while requesting GET https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus/resolve/3ba9d605774198c5868892d7a8deda78031a781f/fineweb-edu-dedup/train-00099-of-00234.parquet +[2024-09-26 07:46:13,319][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. +[2024-09-26 07:46:54,493][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='cdn-lfs-us-1.hf.co', port=443): Max retries exceeded with url: /repos/84/60/8460350cc6561c1bfe01747770f8ad9c967e0591b55575c003393c96dac388d7/f22a1942f98e9a5805f3454915b837afa0f290b9a65010e6b14380e0a8582324?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27train-00099-of-00234.parquet%3B+filename%3D%22train-00099-of-00234.parquet%22%3B&Expires=1727595974&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNzU5NTk3NH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zLzg0LzYwLzg0NjAzNTBjYzY1NjFjMWJmZTAxNzQ3NzcwZjhhZDljOTY3ZTA1OTFiNTU1NzVjMDAzMzkzYzk2ZGFjMzg4ZDcvZjIyYTE5NDJmOThlOWE1ODA1ZjM0NTQ5MTViODM3YWZhMGYyOTBiOWE2NTAxMGU2YjE0MzgwZTBhODU4MjMyND9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoifV19&Signature=atQd9KcbC6xxv-gSTlAd1q1k8YNWebxcV5mfU-8yEEF8kH4f5m-BDqVvfcMgc8edLNLmD8mBJgQ1ZLglzL~eXVarqhGdy8MsW~~Az8SQsxZ-gR44vN-gX6e2qrX5blKVZMtBVlNNY8lhpiPdnNOtJSxE0SchDDWcYUpgHhwY0r9ZBFJ9-RS66HCiImRlk0eOyZI-bp~-uvCN-GqBz~dr-x-DK0Rqi8~1eiq~eLdfsxS2Kw3hXfZfQj8tsCTisXT6WSVnR6-MB-jtXKFMtxGttrMiACG1dpeLgfhwPY0DHAhaZOERyyNb9r4Lvk2PqmHmKM~Fz-APm2zvg-R68n1vhQ__&Key-Pair-Id=K24J24Z295AEI9 (Caused by ConnectTimeoutError(, 'Connection to cdn-lfs-us-1.hf.co timed out. (connect timeout=10)'))"), '(Request ID: 40a995b3-1f10-4ce4-bf8f-7b30954ec15a)')' thrown while requesting GET https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus/resolve/3ba9d605774198c5868892d7a8deda78031a781f/fineweb-edu-dedup/train-00099-of-00234.parquet +[2024-09-26 07:46:54,494][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. +[2024-09-26 07:49:21,357][Main][INFO] - [train] Step 1350 out of 20000 | Loss --> 1.922 | Grad_l2 --> 0.311 | Weights_l2 --> 10870.889 | Lr --> 0.006 | Seconds_per_step --> 8.770 | +[2024-09-26 07:52:01,578][Main][INFO] - [train] Step 1375 out of 20000 | Loss --> 1.911 | Grad_l2 --> 0.328 | Weights_l2 --> 10871.186 | Lr --> 0.006 | Seconds_per_step --> 6.409 | +[2024-09-26 07:54:40,482][Main][INFO] - [train] Step 1400 out of 20000 | Loss --> 1.930 | Grad_l2 --> 0.339 | Weights_l2 --> 10871.515 | Lr --> 0.006 | Seconds_per_step --> 6.356 | +[2024-09-26 07:57:19,323][Main][INFO] - [train] Step 1425 out of 20000 | Loss --> 1.930 | Grad_l2 --> 0.312 | Weights_l2 --> 10871.842 | Lr --> 0.006 | Seconds_per_step --> 6.354 | +[2024-09-26 07:59:58,377][Main][INFO] - [train] Step 1450 out of 20000 | Loss --> 1.934 | Grad_l2 --> 0.292 | Weights_l2 --> 10872.159 | Lr --> 0.006 | Seconds_per_step --> 6.362 | +[2024-09-26 08:02:38,855][Main][INFO] - [train] Step 1475 out of 20000 | Loss --> 1.921 | Grad_l2 --> 0.321 | Weights_l2 --> 10872.510 | Lr --> 0.006 | Seconds_per_step --> 6.419 | +[2024-09-26 08:05:17,753][Main][INFO] - [train] Step 1500 out of 20000 | Loss --> 1.936 | Grad_l2 --> 0.359 | Weights_l2 --> 10872.845 | Lr --> 0.006 | Seconds_per_step --> 6.356 | +[2024-09-26 08:07:56,523][Main][INFO] - [train] Step 1525 out of 20000 | Loss --> 1.937 | Grad_l2 --> 0.300 | Weights_l2 --> 10873.187 | Lr --> 0.007 | Seconds_per_step --> 6.351 | +[2024-09-26 08:10:35,509][Main][INFO] - [train] Step 1550 out of 20000 | Loss --> 1.916 | Grad_l2 --> 0.292 | Weights_l2 --> 10873.530 | Lr --> 0.007 | Seconds_per_step --> 6.359 | +[2024-09-26 08:13:14,414][Main][INFO] - [train] Step 1575 out of 20000 | Loss --> 1.924 | Grad_l2 --> 0.316 | Weights_l2 --> 10873.872 | Lr --> 0.007 | Seconds_per_step --> 6.356 | +[2024-09-26 08:15:54,946][Main][INFO] - [train] Step 1600 out of 20000 | Loss --> 1.911 | Grad_l2 --> 0.330 | Weights_l2 --> 10874.224 | Lr --> 0.007 | Seconds_per_step --> 6.421 | +[2024-09-26 08:18:33,786][Main][INFO] - [train] Step 1625 out of 20000 | Loss --> 1.928 | Grad_l2 --> 0.302 | Weights_l2 --> 10874.570 | Lr --> 0.007 | Seconds_per_step --> 6.354 | +[2024-09-26 08:21:12,682][Main][INFO] - [train] Step 1650 out of 20000 | Loss --> 1.924 | Grad_l2 --> 0.353 | Weights_l2 --> 10874.941 | Lr --> 0.007 | Seconds_per_step --> 6.356 | +[2024-09-26 08:23:51,615][Main][INFO] - [train] Step 1675 out of 20000 | Loss --> 1.933 | Grad_l2 --> 0.335 | Weights_l2 --> 10875.316 | Lr --> 0.007 | Seconds_per_step --> 6.357 | +[2024-09-26 08:26:32,360][Main][INFO] - [train] Step 1700 out of 20000 | Loss --> 1.919 | Grad_l2 --> 0.311 | Weights_l2 --> 10875.680 | Lr --> 0.007 | Seconds_per_step --> 6.430 | +[2024-09-26 08:29:11,124][Main][INFO] - [train] Step 1725 out of 20000 | Loss --> 1.926 | Grad_l2 --> 0.340 | Weights_l2 --> 10876.053 | Lr --> 0.007 | Seconds_per_step --> 6.350 | +[2024-09-26 08:31:49,872][Main][INFO] - [train] Step 1750 out of 20000 | Loss --> 1.940 | Grad_l2 --> 0.306 | Weights_l2 --> 10876.446 | Lr --> 0.007 | Seconds_per_step --> 6.350 | +[2024-09-26 08:34:28,712][Main][INFO] - [train] Step 1775 out of 20000 | Loss --> 1.914 | Grad_l2 --> 0.346 | Weights_l2 --> 10876.831 | Lr --> 0.007 | Seconds_per_step --> 6.354 | +[2024-09-26 08:37:08,887][Main][INFO] - [train] Step 1800 out of 20000 | Loss --> 1.937 | Grad_l2 --> 0.320 | Weights_l2 --> 10877.212 | Lr --> 0.007 | Seconds_per_step --> 6.407 | +[2024-09-26 08:39:47,648][Main][INFO] - [train] Step 1825 out of 20000 | Loss --> 1.926 | Grad_l2 --> 0.342 | Weights_l2 --> 10877.625 | Lr --> 0.007 | Seconds_per_step --> 6.350 | +[2024-09-26 08:42:26,496][Main][INFO] - [train] Step 1850 out of 20000 | Loss --> 1.929 | Grad_l2 --> 0.348 | Weights_l2 --> 10878.028 | Lr --> 0.007 | Seconds_per_step --> 6.354 | +[2024-09-26 08:45:05,377][Main][INFO] - [train] Step 1875 out of 20000 | Loss --> 1.925 | Grad_l2 --> 0.295 | Weights_l2 --> 10878.434 | Lr --> 0.007 | Seconds_per_step --> 6.355 | +[2024-09-26 08:47:45,616][Main][INFO] - [train] Step 1900 out of 20000 | Loss --> 1.929 | Grad_l2 --> 0.327 | Weights_l2 --> 10878.833 | Lr --> 0.007 | Seconds_per_step --> 6.409 | +[2024-09-26 08:50:24,444][Main][INFO] - [train] Step 1925 out of 20000 | Loss --> 1.933 | Grad_l2 --> 0.322 | Weights_l2 --> 10879.239 | Lr --> 0.007 | Seconds_per_step --> 6.353 | +[2024-09-26 08:53:03,308][Main][INFO] - [train] Step 1950 out of 20000 | Loss --> 1.928 | Grad_l2 --> 0.366 | Weights_l2 --> 10879.675 | Lr --> 0.007 | Seconds_per_step --> 6.354 | +[2024-09-26 08:55:42,191][Main][INFO] - [train] Step 1975 out of 20000 | Loss --> 1.921 | Grad_l2 --> 0.309 | Weights_l2 --> 10880.106 | Lr --> 0.007 | Seconds_per_step --> 6.355 | +[2024-09-26 08:58:21,124][Main][INFO] - [train] Step 2000 out of 20000 | Loss --> 1.924 | Grad_l2 --> 0.335 | Weights_l2 --> 10880.519 | Lr --> 0.007 | Seconds_per_step --> 6.357 | +[2024-09-26 09:01:01,462][Main][INFO] - [train] Step 2025 out of 20000 | Loss --> 1.925 | Grad_l2 --> 0.334 | Weights_l2 --> 10880.950 | Lr --> 0.007 | Seconds_per_step --> 6.413 | +[2024-09-26 09:03:40,264][Main][INFO] - [train] Step 2050 out of 20000 | Loss --> 1.933 | Grad_l2 --> 0.322 | Weights_l2 --> 10881.378 | Lr --> 0.007 | Seconds_per_step --> 6.352 | +[2024-09-26 09:06:19,141][Main][INFO] - [train] Step 2075 out of 20000 | Loss --> 1.914 | Grad_l2 --> 0.337 | Weights_l2 --> 10881.828 | Lr --> 0.007 | Seconds_per_step --> 6.355 | +[2024-09-26 09:08:57,995][Main][INFO] - [train] Step 2100 out of 20000 | Loss --> 1.935 | Grad_l2 --> 0.340 | Weights_l2 --> 10882.271 | Lr --> 0.007 | Seconds_per_step --> 6.354 | +[2024-09-26 09:11:38,341][Main][INFO] - [train] Step 2125 out of 20000 | Loss --> 1.936 | Grad_l2 --> 0.316 | Weights_l2 --> 10882.724 | Lr --> 0.007 | Seconds_per_step --> 6.414 | +[2024-09-26 09:14:17,334][Main][INFO] - [train] Step 2150 out of 20000 | Loss --> 1.924 | Grad_l2 --> 0.360 | Weights_l2 --> 10883.187 | Lr --> 0.007 | Seconds_per_step --> 6.360 | +[2024-09-26 09:16:56,337][Main][INFO] - [train] Step 2175 out of 20000 | Loss --> 1.927 | Grad_l2 --> 0.346 | Weights_l2 --> 10883.659 | Lr --> 0.007 | Seconds_per_step --> 6.360 | +[2024-09-26 09:19:35,469][Main][INFO] - [train] Step 2200 out of 20000 | Loss --> 1.932 | Grad_l2 --> 0.340 | Weights_l2 --> 10884.128 | Lr --> 0.007 | Seconds_per_step --> 6.365 | +[2024-09-26 09:22:15,920][Main][INFO] - [train] Step 2225 out of 20000 | Loss --> 1.934 | Grad_l2 --> 0.351 | Weights_l2 --> 10884.594 | Lr --> 0.007 | Seconds_per_step --> 6.418 | +[2024-09-26 09:24:54,719][Main][INFO] - [train] Step 2250 out of 20000 | Loss --> 1.928 | Grad_l2 --> 0.330 | Weights_l2 --> 10885.074 | Lr --> 0.007 | Seconds_per_step --> 6.352 | +[2024-09-26 09:27:33,363][Main][INFO] - [train] Step 2275 out of 20000 | Loss --> 1.922 | Grad_l2 --> 0.369 | Weights_l2 --> 10885.570 | Lr --> 0.007 | Seconds_per_step --> 6.346 | +[2024-09-26 09:30:12,271][Main][INFO] - [train] Step 2300 out of 20000 | Loss --> 1.922 | Grad_l2 --> 0.308 | Weights_l2 --> 10886.038 | Lr --> 0.007 | Seconds_per_step --> 6.356 | +[2024-09-26 09:32:52,939][Main][INFO] - [train] Step 2325 out of 20000 | Loss --> 1.938 | Grad_l2 --> 0.325 | Weights_l2 --> 10886.513 | Lr --> 0.007 | Seconds_per_step --> 6.427 | +[2024-09-26 09:35:31,794][Main][INFO] - [train] Step 2350 out of 20000 | Loss --> 1.921 | Grad_l2 --> 0.344 | Weights_l2 --> 10887.006 | Lr --> 0.007 | Seconds_per_step --> 6.354 | +[2024-09-26 09:38:11,021][Main][INFO] - [train] Step 2375 out of 20000 | Loss --> 1.927 | Grad_l2 --> 0.337 | Weights_l2 --> 10887.485 | Lr --> 0.007 | Seconds_per_step --> 6.369 | +[2024-09-26 09:40:49,998][Main][INFO] - [train] Step 2400 out of 20000 | Loss --> 1.938 | Grad_l2 --> 0.357 | Weights_l2 --> 10888.000 | Lr --> 0.007 | Seconds_per_step --> 6.359 | +[2024-09-26 09:43:29,011][Main][INFO] - [train] Step 2425 out of 20000 | Loss --> 1.939 | Grad_l2 --> 0.319 | Weights_l2 --> 10888.494 | Lr --> 0.007 | Seconds_per_step --> 6.360 | +[2024-09-26 09:46:09,341][Main][INFO] - [train] Step 2450 out of 20000 | Loss --> 1.917 | Grad_l2 --> 0.357 | Weights_l2 --> 10889.021 | Lr --> 0.007 | Seconds_per_step --> 6.413 | +[2024-09-26 09:48:48,227][Main][INFO] - [train] Step 2475 out of 20000 | Loss --> 1.947 | Grad_l2 --> 0.338 | Weights_l2 --> 10889.545 | Lr --> 0.007 | Seconds_per_step --> 6.355 | +[2024-09-26 09:51:27,086][Main][INFO] - [train] Step 2500 out of 20000 | Loss --> 1.936 | Grad_l2 --> 0.375 | Weights_l2 --> 10890.087 | Lr --> 0.007 | Seconds_per_step --> 6.354 | +[2024-09-26 09:51:27,087][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-2500 +[2024-09-26 09:51:27,094][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading +[2024-09-26 09:51:33,850][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-2500/model.safetensors +[2024-09-26 09:51:42,450][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-2500/optimizer.bin +[2024-09-26 09:51:42,452][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-2500/scheduler.bin +[2024-09-26 09:51:42,452][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-2500/sampler.bin +[2024-09-26 09:51:42,453][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-2500/sampler_1.bin +[2024-09-26 09:51:42,454][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-2500/random_states_0.pkl +[2024-09-26 09:54:21,001][Main][INFO] - [train] Step 2525 out of 20000 | Loss --> 1.936 | Grad_l2 --> 0.338 | Weights_l2 --> 10890.603 | Lr --> 0.008 | Seconds_per_step --> 6.957 | +[2024-09-26 09:57:01,541][Main][INFO] - [train] Step 2550 out of 20000 | Loss --> 1.943 | Grad_l2 --> 0.301 | Weights_l2 --> 10891.135 | Lr --> 0.008 | Seconds_per_step --> 6.422 | +[2024-09-26 09:59:40,436][Main][INFO] - [train] Step 2575 out of 20000 | Loss --> 1.947 | Grad_l2 --> 0.360 | Weights_l2 --> 10891.666 | Lr --> 0.008 | Seconds_per_step --> 6.356 | +[2024-09-26 10:02:19,300][Main][INFO] - [train] Step 2600 out of 20000 | Loss --> 1.940 | Grad_l2 --> 0.367 | Weights_l2 --> 10892.199 | Lr --> 0.008 | Seconds_per_step --> 6.354 | +[2024-09-26 10:04:58,156][Main][INFO] - [train] Step 2625 out of 20000 | Loss --> 1.949 | Grad_l2 --> 0.318 | Weights_l2 --> 10892.744 | Lr --> 0.008 | Seconds_per_step --> 6.354 | +[2024-09-26 10:07:38,459][Main][INFO] - [train] Step 2650 out of 20000 | Loss --> 1.944 | Grad_l2 --> 0.397 | Weights_l2 --> 10893.287 | Lr --> 0.008 | Seconds_per_step --> 6.412 | +[2024-09-26 10:10:17,036][Main][INFO] - [train] Step 2675 out of 20000 | Loss --> 1.939 | Grad_l2 --> 0.317 | Weights_l2 --> 10893.838 | Lr --> 0.008 | Seconds_per_step --> 6.343 | +[2024-09-26 10:12:55,947][Main][INFO] - [train] Step 2700 out of 20000 | Loss --> 1.942 | Grad_l2 --> 0.374 | Weights_l2 --> 10894.396 | Lr --> 0.008 | Seconds_per_step --> 6.356 | +[2024-09-26 10:15:34,826][Main][INFO] - [train] Step 2725 out of 20000 | Loss --> 1.950 | Grad_l2 --> 0.362 | Weights_l2 --> 10894.969 | Lr --> 0.008 | Seconds_per_step --> 6.355 | +[2024-09-26 10:18:15,248][Main][INFO] - [train] Step 2750 out of 20000 | Loss --> 1.947 | Grad_l2 --> 0.368 | Weights_l2 --> 10895.548 | Lr --> 0.008 | Seconds_per_step --> 6.417 | +[2024-09-26 10:20:54,113][Main][INFO] - [train] Step 2775 out of 20000 | Loss --> 1.941 | Grad_l2 --> 0.341 | Weights_l2 --> 10896.113 | Lr --> 0.008 | Seconds_per_step --> 6.355 | +[2024-09-26 10:23:33,112][Main][INFO] - [train] Step 2800 out of 20000 | Loss --> 1.946 | Grad_l2 --> 0.317 | Weights_l2 --> 10896.685 | Lr --> 0.008 | Seconds_per_step --> 6.360 | +[2024-09-26 10:26:11,968][Main][INFO] - [train] Step 2825 out of 20000 | Loss --> 1.938 | Grad_l2 --> 0.337 | Weights_l2 --> 10897.278 | Lr --> 0.008 | Seconds_per_step --> 6.354 | +[2024-09-26 10:28:51,046][Main][INFO] - [train] Step 2850 out of 20000 | Loss --> 1.948 | Grad_l2 --> 0.383 | Weights_l2 --> 10897.870 | Lr --> 0.008 | Seconds_per_step --> 6.363 | +[2024-09-26 10:31:32,024][Main][INFO] - [train] Step 2875 out of 20000 | Loss --> 1.940 | Grad_l2 --> 0.332 | Weights_l2 --> 10898.450 | Lr --> 0.008 | Seconds_per_step --> 6.439 | +[2024-09-26 10:34:11,244][Main][INFO] - [train] Step 2900 out of 20000 | Loss --> 1.931 | Grad_l2 --> 0.369 | Weights_l2 --> 10899.034 | Lr --> 0.008 | Seconds_per_step --> 6.369 | +[2024-09-26 10:36:50,140][Main][INFO] - [train] Step 2925 out of 20000 | Loss --> 1.930 | Grad_l2 --> 0.359 | Weights_l2 --> 10899.635 | Lr --> 0.008 | Seconds_per_step --> 6.356 | +[2024-09-26 10:39:29,089][Main][INFO] - [train] Step 2950 out of 20000 | Loss --> 1.943 | Grad_l2 --> 0.287 | Weights_l2 --> 10900.223 | Lr --> 0.008 | Seconds_per_step --> 6.358 | +[2024-09-26 10:42:09,613][Main][INFO] - [train] Step 2975 out of 20000 | Loss --> 1.952 | Grad_l2 --> 0.405 | Weights_l2 --> 10900.847 | Lr --> 0.008 | Seconds_per_step --> 6.421 | +[2024-09-26 10:44:48,599][Main][INFO] - [train] Step 3000 out of 20000 | Loss --> 1.956 | Grad_l2 --> 0.311 | Weights_l2 --> 10901.457 | Lr --> 0.008 | Seconds_per_step --> 6.359 | +[2024-09-26 10:47:27,367][Main][INFO] - [train] Step 3025 out of 20000 | Loss --> 1.948 | Grad_l2 --> 0.387 | Weights_l2 --> 10902.080 | Lr --> 0.008 | Seconds_per_step --> 6.351 | +[2024-09-26 10:50:06,285][Main][INFO] - [train] Step 3050 out of 20000 | Loss --> 1.949 | Grad_l2 --> 0.337 | Weights_l2 --> 10902.707 | Lr --> 0.008 | Seconds_per_step --> 6.357 | +[2024-09-26 10:52:47,045][Main][INFO] - [train] Step 3075 out of 20000 | Loss --> 1.938 | Grad_l2 --> 0.376 | Weights_l2 --> 10903.314 | Lr --> 0.008 | Seconds_per_step --> 6.430 | +[2024-09-26 10:55:26,161][Main][INFO] - [train] Step 3100 out of 20000 | Loss --> 1.968 | Grad_l2 --> 0.354 | Weights_l2 --> 10903.952 | Lr --> 0.008 | Seconds_per_step --> 6.365 | +[2024-09-26 10:58:05,147][Main][INFO] - [train] Step 3125 out of 20000 | Loss --> 1.954 | Grad_l2 --> 0.301 | Weights_l2 --> 10904.580 | Lr --> 0.008 | Seconds_per_step --> 6.359 | +[2024-09-26 11:00:43,966][Main][INFO] - [train] Step 3150 out of 20000 | Loss --> 1.953 | Grad_l2 --> 0.333 | Weights_l2 --> 10905.209 | Lr --> 0.008 | Seconds_per_step --> 6.353 | +[2024-09-26 11:03:24,573][Main][INFO] - [train] Step 3175 out of 20000 | Loss --> 1.948 | Grad_l2 --> 0.427 | Weights_l2 --> 10905.872 | Lr --> 0.008 | Seconds_per_step --> 6.424 | +[2024-09-26 11:06:03,444][Main][INFO] - [train] Step 3200 out of 20000 | Loss --> 1.950 | Grad_l2 --> 0.363 | Weights_l2 --> 10906.523 | Lr --> 0.008 | Seconds_per_step --> 6.355 | +[2024-09-26 11:08:42,330][Main][INFO] - [train] Step 3225 out of 20000 | Loss --> 1.948 | Grad_l2 --> 0.300 | Weights_l2 --> 10907.189 | Lr --> 0.008 | Seconds_per_step --> 6.355 | +[2024-09-26 11:11:21,223][Main][INFO] - [train] Step 3250 out of 20000 | Loss --> 1.951 | Grad_l2 --> 0.303 | Weights_l2 --> 10907.846 | Lr --> 0.008 | Seconds_per_step --> 6.356 | +[2024-09-26 11:14:00,473][Main][INFO] - [train] Step 3275 out of 20000 | Loss --> 1.934 | Grad_l2 --> 0.367 | Weights_l2 --> 10908.504 | Lr --> 0.008 | Seconds_per_step --> 6.370 | +[2024-09-26 11:16:40,832][Main][INFO] - [train] Step 3300 out of 20000 | Loss --> 1.940 | Grad_l2 --> 0.289 | Weights_l2 --> 10909.172 | Lr --> 0.008 | Seconds_per_step --> 6.414 | +[2024-09-26 11:19:19,803][Main][INFO] - [train] Step 3325 out of 20000 | Loss --> 1.954 | Grad_l2 --> 0.407 | Weights_l2 --> 10909.850 | Lr --> 0.008 | Seconds_per_step --> 6.359 | +[2024-09-26 11:21:58,712][Main][INFO] - [train] Step 3350 out of 20000 | Loss --> 1.942 | Grad_l2 --> 0.364 | Weights_l2 --> 10910.529 | Lr --> 0.008 | Seconds_per_step --> 6.356 | +[2024-09-26 11:24:37,769][Main][INFO] - [train] Step 3375 out of 20000 | Loss --> 1.959 | Grad_l2 --> 0.308 | Weights_l2 --> 10911.227 | Lr --> 0.008 | Seconds_per_step --> 6.362 | +[2024-09-26 11:27:18,227][Main][INFO] - [train] Step 3400 out of 20000 | Loss --> 1.954 | Grad_l2 --> 0.355 | Weights_l2 --> 10911.910 | Lr --> 0.008 | Seconds_per_step --> 6.418 | +[2024-09-26 11:29:56,805][Main][INFO] - [train] Step 3425 out of 20000 | Loss --> 1.950 | Grad_l2 --> 0.324 | Weights_l2 --> 10912.599 | Lr --> 0.008 | Seconds_per_step --> 6.343 | +[2024-09-26 11:32:35,728][Main][INFO] - [train] Step 3450 out of 20000 | Loss --> 1.944 | Grad_l2 --> 0.385 | Weights_l2 --> 10913.301 | Lr --> 0.008 | Seconds_per_step --> 6.357 | +[2024-09-26 11:35:14,627][Main][INFO] - [train] Step 3475 out of 20000 | Loss --> 1.957 | Grad_l2 --> 0.313 | Weights_l2 --> 10913.997 | Lr --> 0.008 | Seconds_per_step --> 6.356 | +[2024-09-26 11:37:55,050][Main][INFO] - [train] Step 3500 out of 20000 | Loss --> 1.942 | Grad_l2 --> 0.377 | Weights_l2 --> 10914.701 | Lr --> 0.008 | Seconds_per_step --> 6.417 | +[2024-09-26 11:40:33,876][Main][INFO] - [train] Step 3525 out of 20000 | Loss --> 1.940 | Grad_l2 --> 0.369 | Weights_l2 --> 10915.418 | Lr --> 0.009 | Seconds_per_step --> 6.353 | +[2024-09-26 11:43:12,747][Main][INFO] - [train] Step 3550 out of 20000 | Loss --> 1.953 | Grad_l2 --> 0.285 | Weights_l2 --> 10916.129 | Lr --> 0.009 | Seconds_per_step --> 6.355 | +[2024-09-26 11:45:51,922][Main][INFO] - [train] Step 3575 out of 20000 | Loss --> 1.943 | Grad_l2 --> 0.305 | Weights_l2 --> 10916.844 | Lr --> 0.009 | Seconds_per_step --> 6.367 | +[2024-09-26 11:48:31,052][Main][INFO] - [train] Step 3600 out of 20000 | Loss --> 1.968 | Grad_l2 --> 0.345 | Weights_l2 --> 10917.554 | Lr --> 0.009 | Seconds_per_step --> 6.365 | +[2024-09-26 11:51:11,327][Main][INFO] - [train] Step 3625 out of 20000 | Loss --> 1.948 | Grad_l2 --> 0.323 | Weights_l2 --> 10918.305 | Lr --> 0.009 | Seconds_per_step --> 6.411 | +[2024-09-26 11:53:50,626][Main][INFO] - [train] Step 3650 out of 20000 | Loss --> 1.949 | Grad_l2 --> 0.315 | Weights_l2 --> 10919.054 | Lr --> 0.009 | Seconds_per_step --> 6.372 | +[2024-09-26 11:56:29,843][Main][INFO] - [train] Step 3675 out of 20000 | Loss --> 1.942 | Grad_l2 --> 0.321 | Weights_l2 --> 10919.795 | Lr --> 0.009 | Seconds_per_step --> 6.369 | +[2024-09-26 11:59:08,822][Main][INFO] - [train] Step 3700 out of 20000 | Loss --> 1.941 | Grad_l2 --> 0.301 | Weights_l2 --> 10920.525 | Lr --> 0.009 | Seconds_per_step --> 6.359 | +[2024-09-26 12:01:49,270][Main][INFO] - [train] Step 3725 out of 20000 | Loss --> 1.946 | Grad_l2 --> 0.448 | Weights_l2 --> 10921.260 | Lr --> 0.009 | Seconds_per_step --> 6.418 | +[2024-09-26 12:04:28,244][Main][INFO] - [train] Step 3750 out of 20000 | Loss --> 1.960 | Grad_l2 --> 0.461 | Weights_l2 --> 10922.017 | Lr --> 0.009 | Seconds_per_step --> 6.359 | +[2024-09-26 12:07:07,320][Main][INFO] - [train] Step 3775 out of 20000 | Loss --> 1.944 | Grad_l2 --> 0.294 | Weights_l2 --> 10922.780 | Lr --> 0.009 | Seconds_per_step --> 6.363 | +[2024-09-26 12:09:46,357][Main][INFO] - [train] Step 3800 out of 20000 | Loss --> 1.954 | Grad_l2 --> 0.272 | Weights_l2 --> 10923.509 | Lr --> 0.009 | Seconds_per_step --> 6.361 | +[2024-09-26 12:12:26,895][Main][INFO] - [train] Step 3825 out of 20000 | Loss --> 1.954 | Grad_l2 --> 0.290 | Weights_l2 --> 10924.275 | Lr --> 0.009 | Seconds_per_step --> 6.421 | +[2024-09-26 12:15:06,022][Main][INFO] - [train] Step 3850 out of 20000 | Loss --> 1.953 | Grad_l2 --> 0.290 | Weights_l2 --> 10925.034 | Lr --> 0.009 | Seconds_per_step --> 6.365 | +[2024-09-26 12:17:45,013][Main][INFO] - [train] Step 3875 out of 20000 | Loss --> 1.950 | Grad_l2 --> 0.347 | Weights_l2 --> 10925.849 | Lr --> 0.009 | Seconds_per_step --> 6.360 | +[2024-09-26 12:20:24,089][Main][INFO] - [train] Step 3900 out of 20000 | Loss --> 1.953 | Grad_l2 --> 0.276 | Weights_l2 --> 10926.630 | Lr --> 0.009 | Seconds_per_step --> 6.363 | +[2024-09-26 12:23:05,117][Main][INFO] - [train] Step 3925 out of 20000 | Loss --> 1.956 | Grad_l2 --> 0.293 | Weights_l2 --> 10927.402 | Lr --> 0.009 | Seconds_per_step --> 6.441 | +[2024-09-26 12:25:44,235][Main][INFO] - [train] Step 3950 out of 20000 | Loss --> 1.929 | Grad_l2 --> 0.346 | Weights_l2 --> 10928.194 | Lr --> 0.009 | Seconds_per_step --> 6.365 | +[2024-09-26 12:28:23,212][Main][INFO] - [train] Step 3975 out of 20000 | Loss --> 1.947 | Grad_l2 --> 0.361 | Weights_l2 --> 10928.998 | Lr --> 0.009 | Seconds_per_step --> 6.359 | +[2024-09-26 12:31:02,226][Main][INFO] - [train] Step 4000 out of 20000 | Loss --> 1.957 | Grad_l2 --> 0.290 | Weights_l2 --> 10929.807 | Lr --> 0.009 | Seconds_per_step --> 6.360 | +[2024-09-26 12:33:42,792][Main][INFO] - [train] Step 4025 out of 20000 | Loss --> 1.941 | Grad_l2 --> 0.287 | Weights_l2 --> 10930.616 | Lr --> 0.009 | Seconds_per_step --> 6.423 | +[2024-09-26 12:36:21,643][Main][INFO] - [train] Step 4050 out of 20000 | Loss --> 1.953 | Grad_l2 --> 0.299 | Weights_l2 --> 10931.443 | Lr --> 0.009 | Seconds_per_step --> 6.354 | +[2024-09-26 12:39:00,738][Main][INFO] - [train] Step 4075 out of 20000 | Loss --> 1.952 | Grad_l2 --> 0.384 | Weights_l2 --> 10932.232 | Lr --> 0.009 | Seconds_per_step --> 6.364 | +[2024-09-26 12:41:39,948][Main][INFO] - [train] Step 4100 out of 20000 | Loss --> 1.953 | Grad_l2 --> 0.339 | Weights_l2 --> 10933.070 | Lr --> 0.009 | Seconds_per_step --> 6.368 | +[2024-09-26 12:44:19,208][Main][INFO] - [train] Step 4125 out of 20000 | Loss --> 1.954 | Grad_l2 --> 0.302 | Weights_l2 --> 10933.890 | Lr --> 0.009 | Seconds_per_step --> 6.370 | +[2024-09-26 12:46:59,798][Main][INFO] - [train] Step 4150 out of 20000 | Loss --> 1.952 | Grad_l2 --> 0.294 | Weights_l2 --> 10934.732 | Lr --> 0.009 | Seconds_per_step --> 6.424 | +[2024-09-26 12:49:38,794][Main][INFO] - [train] Step 4175 out of 20000 | Loss --> 1.957 | Grad_l2 --> 0.366 | Weights_l2 --> 10935.588 | Lr --> 0.009 | Seconds_per_step --> 6.360 | +[2024-09-26 12:52:17,936][Main][INFO] - [train] Step 4200 out of 20000 | Loss --> 1.953 | Grad_l2 --> 0.446 | Weights_l2 --> 10936.422 | Lr --> 0.009 | Seconds_per_step --> 6.366 | +[2024-09-26 12:54:57,055][Main][INFO] - [train] Step 4225 out of 20000 | Loss --> 1.949 | Grad_l2 --> 0.298 | Weights_l2 --> 10937.257 | Lr --> 0.009 | Seconds_per_step --> 6.365 | +[2024-09-26 12:57:37,811][Main][INFO] - [train] Step 4250 out of 20000 | Loss --> 1.954 | Grad_l2 --> 0.312 | Weights_l2 --> 10938.113 | Lr --> 0.009 | Seconds_per_step --> 6.430 | +[2024-09-26 13:00:17,311][Main][INFO] - [train] Step 4275 out of 20000 | Loss --> 1.952 | Grad_l2 --> 0.281 | Weights_l2 --> 10938.971 | Lr --> 0.009 | Seconds_per_step --> 6.380 | +[2024-09-26 13:02:56,434][Main][INFO] - [train] Step 4300 out of 20000 | Loss --> 1.954 | Grad_l2 --> 0.311 | Weights_l2 --> 10939.825 | Lr --> 0.009 | Seconds_per_step --> 6.365 | +[2024-09-26 13:05:35,376][Main][INFO] - [train] Step 4325 out of 20000 | Loss --> 1.960 | Grad_l2 --> 0.519 | Weights_l2 --> 10940.709 | Lr --> 0.009 | Seconds_per_step --> 6.358 | +[2024-09-26 13:08:15,726][Main][INFO] - [train] Step 4350 out of 20000 | Loss --> 1.953 | Grad_l2 --> 0.295 | Weights_l2 --> 10941.585 | Lr --> 0.009 | Seconds_per_step --> 6.414 | +[2024-09-26 13:10:54,566][Main][INFO] - [train] Step 4375 out of 20000 | Loss --> 1.951 | Grad_l2 --> 0.292 | Weights_l2 --> 10942.459 | Lr --> 0.009 | Seconds_per_step --> 6.354 | +[2024-09-26 13:13:33,544][Main][INFO] - [train] Step 4400 out of 20000 | Loss --> 1.966 | Grad_l2 --> 0.307 | Weights_l2 --> 10943.354 | Lr --> 0.009 | Seconds_per_step --> 6.359 | +[2024-09-26 13:16:12,503][Main][INFO] - [train] Step 4425 out of 20000 | Loss --> 1.961 | Grad_l2 --> 0.268 | Weights_l2 --> 10944.268 | Lr --> 0.009 | Seconds_per_step --> 6.358 | +[2024-09-26 13:18:53,053][Main][INFO] - [train] Step 4450 out of 20000 | Loss --> 1.959 | Grad_l2 --> 0.276 | Weights_l2 --> 10945.139 | Lr --> 0.009 | Seconds_per_step --> 6.422 | +[2024-09-26 13:21:31,945][Main][INFO] - [train] Step 4475 out of 20000 | Loss --> 1.950 | Grad_l2 --> 0.294 | Weights_l2 --> 10946.059 | Lr --> 0.009 | Seconds_per_step --> 6.356 | +[2024-09-26 13:24:10,873][Main][INFO] - [train] Step 4500 out of 20000 | Loss --> 1.960 | Grad_l2 --> 0.297 | Weights_l2 --> 10946.979 | Lr --> 0.009 | Seconds_per_step --> 6.357 | +[2024-09-26 13:26:49,831][Main][INFO] - [train] Step 4525 out of 20000 | Loss --> 1.964 | Grad_l2 --> 0.530 | Weights_l2 --> 10947.900 | Lr --> 0.010 | Seconds_per_step --> 6.358 | +[2024-09-26 13:29:29,122][Main][INFO] - [train] Step 4550 out of 20000 | Loss --> 1.957 | Grad_l2 --> 0.303 | Weights_l2 --> 10948.803 | Lr --> 0.010 | Seconds_per_step --> 6.372 | +[2024-09-26 13:32:10,085][Main][INFO] - [train] Step 4575 out of 20000 | Loss --> 1.955 | Grad_l2 --> 0.271 | Weights_l2 --> 10949.723 | Lr --> 0.010 | Seconds_per_step --> 6.438 | +[2024-09-26 13:34:49,279][Main][INFO] - [train] Step 4600 out of 20000 | Loss --> 1.960 | Grad_l2 --> 0.271 | Weights_l2 --> 10950.631 | Lr --> 0.010 | Seconds_per_step --> 6.368 | +[2024-09-26 13:37:28,237][Main][INFO] - [train] Step 4625 out of 20000 | Loss --> 1.944 | Grad_l2 --> 0.275 | Weights_l2 --> 10951.542 | Lr --> 0.010 | Seconds_per_step --> 6.358 | +[2024-09-26 13:40:07,086][Main][INFO] - [train] Step 4650 out of 20000 | Loss --> 1.957 | Grad_l2 --> 0.281 | Weights_l2 --> 10952.485 | Lr --> 0.010 | Seconds_per_step --> 6.354 | +[2024-09-26 13:42:47,817][Main][INFO] - [train] Step 4675 out of 20000 | Loss --> 1.962 | Grad_l2 --> 0.284 | Weights_l2 --> 10953.416 | Lr --> 0.010 | Seconds_per_step --> 6.429 | +[2024-09-26 13:45:26,883][Main][INFO] - [train] Step 4700 out of 20000 | Loss --> 1.957 | Grad_l2 --> 0.285 | Weights_l2 --> 10954.369 | Lr --> 0.010 | Seconds_per_step --> 6.363 | +[2024-09-26 13:48:05,853][Main][INFO] - [train] Step 4725 out of 20000 | Loss --> 1.969 | Grad_l2 --> 0.303 | Weights_l2 --> 10955.355 | Lr --> 0.010 | Seconds_per_step --> 6.359 | +[2024-09-26 13:50:44,830][Main][INFO] - [train] Step 4750 out of 20000 | Loss --> 1.966 | Grad_l2 --> 0.273 | Weights_l2 --> 10956.324 | Lr --> 0.010 | Seconds_per_step --> 6.359 | +[2024-09-26 13:53:25,460][Main][INFO] - [train] Step 4775 out of 20000 | Loss --> 1.961 | Grad_l2 --> 0.297 | Weights_l2 --> 10957.292 | Lr --> 0.010 | Seconds_per_step --> 6.425 | +[2024-09-26 13:56:04,451][Main][INFO] - [train] Step 4800 out of 20000 | Loss --> 1.963 | Grad_l2 --> 0.293 | Weights_l2 --> 10958.262 | Lr --> 0.010 | Seconds_per_step --> 6.360 | +[2024-09-26 13:58:43,787][Main][INFO] - [train] Step 4825 out of 20000 | Loss --> 1.965 | Grad_l2 --> 0.286 | Weights_l2 --> 10959.243 | Lr --> 0.010 | Seconds_per_step --> 6.373 | +[2024-09-26 14:01:22,897][Main][INFO] - [train] Step 4850 out of 20000 | Loss --> 1.965 | Grad_l2 --> 0.321 | Weights_l2 --> 10960.237 | Lr --> 0.010 | Seconds_per_step --> 6.364 | +[2024-09-26 14:04:03,439][Main][INFO] - [train] Step 4875 out of 20000 | Loss --> 1.976 | Grad_l2 --> 0.321 | Weights_l2 --> 10961.216 | Lr --> 0.010 | Seconds_per_step --> 6.422 | +[2024-09-26 14:06:42,326][Main][INFO] - [train] Step 4900 out of 20000 | Loss --> 1.977 | Grad_l2 --> 0.275 | Weights_l2 --> 10962.224 | Lr --> 0.010 | Seconds_per_step --> 6.355 | +[2024-09-26 14:09:21,502][Main][INFO] - [train] Step 4925 out of 20000 | Loss --> 1.980 | Grad_l2 --> 0.279 | Weights_l2 --> 10963.223 | Lr --> 0.010 | Seconds_per_step --> 6.367 | +[2024-09-26 14:12:00,728][Main][INFO] - [train] Step 4950 out of 20000 | Loss --> 1.965 | Grad_l2 --> 0.283 | Weights_l2 --> 10964.213 | Lr --> 0.010 | Seconds_per_step --> 6.369 | +[2024-09-26 14:14:39,767][Main][INFO] - [train] Step 4975 out of 20000 | Loss --> 1.976 | Grad_l2 --> 0.278 | Weights_l2 --> 10965.218 | Lr --> 0.010 | Seconds_per_step --> 6.361 | +[2024-09-26 14:17:20,223][Main][INFO] - [train] Step 5000 out of 20000 | Loss --> 1.965 | Grad_l2 --> 0.418 | Weights_l2 --> 10966.244 | Lr --> 0.010 | Seconds_per_step --> 6.418 | +[2024-09-26 14:17:20,224][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-5000 +[2024-09-26 14:17:20,231][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading +[2024-09-26 14:17:26,875][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-5000/model.safetensors +[2024-09-26 14:17:36,098][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-5000/optimizer.bin +[2024-09-26 14:17:36,099][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-5000/scheduler.bin +[2024-09-26 14:17:36,100][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-5000/sampler.bin +[2024-09-26 14:17:36,100][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-5000/sampler_1.bin +[2024-09-26 14:17:36,101][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-5000/random_states_0.pkl +[2024-09-26 14:20:15,087][Main][INFO] - [train] Step 5025 out of 20000 | Loss --> 1.972 | Grad_l2 --> 0.311 | Weights_l2 --> 10967.260 | Lr --> 0.010 | Seconds_per_step --> 6.994 | +[2024-09-26 14:22:54,001][Main][INFO] - [train] Step 5050 out of 20000 | Loss --> 1.970 | Grad_l2 --> 0.532 | Weights_l2 --> 10968.271 | Lr --> 0.010 | Seconds_per_step --> 6.356 | +[2024-09-26 14:25:32,779][Main][INFO] - [train] Step 5075 out of 20000 | Loss --> 1.976 | Grad_l2 --> 0.317 | Weights_l2 --> 10969.264 | Lr --> 0.010 | Seconds_per_step --> 6.351 | +[2024-09-26 14:28:13,392][Main][INFO] - [train] Step 5100 out of 20000 | Loss --> 1.979 | Grad_l2 --> 0.306 | Weights_l2 --> 10970.293 | Lr --> 0.010 | Seconds_per_step --> 6.424 | +[2024-09-26 14:30:52,361][Main][INFO] - [train] Step 5125 out of 20000 | Loss --> 1.981 | Grad_l2 --> 0.293 | Weights_l2 --> 10971.331 | Lr --> 0.010 | Seconds_per_step --> 6.359 | +[2024-09-26 14:33:31,174][Main][INFO] - [train] Step 5150 out of 20000 | Loss --> 1.975 | Grad_l2 --> 0.266 | Weights_l2 --> 10972.376 | Lr --> 0.010 | Seconds_per_step --> 6.352 | +[2024-09-26 14:36:09,998][Main][INFO] - [train] Step 5175 out of 20000 | Loss --> 1.973 | Grad_l2 --> 0.298 | Weights_l2 --> 10973.395 | Lr --> 0.010 | Seconds_per_step --> 6.353 | +[2024-09-26 14:38:50,729][Main][INFO] - [train] Step 5200 out of 20000 | Loss --> 1.968 | Grad_l2 --> 0.277 | Weights_l2 --> 10974.404 | Lr --> 0.010 | Seconds_per_step --> 6.429 | +[2024-09-26 14:41:29,973][Main][INFO] - [train] Step 5225 out of 20000 | Loss --> 1.965 | Grad_l2 --> 0.273 | Weights_l2 --> 10975.413 | Lr --> 0.010 | Seconds_per_step --> 6.370 | +[2024-09-26 14:44:09,050][Main][INFO] - [train] Step 5250 out of 20000 | Loss --> 1.989 | Grad_l2 --> 0.273 | Weights_l2 --> 10976.438 | Lr --> 0.010 | Seconds_per_step --> 6.363 | +[2024-09-26 14:46:48,050][Main][INFO] - [train] Step 5275 out of 20000 | Loss --> 1.986 | Grad_l2 --> 0.277 | Weights_l2 --> 10977.458 | Lr --> 0.010 | Seconds_per_step --> 6.360 | +[2024-09-26 14:49:28,520][Main][INFO] - [train] Step 5300 out of 20000 | Loss --> 1.966 | Grad_l2 --> 0.302 | Weights_l2 --> 10978.479 | Lr --> 0.010 | Seconds_per_step --> 6.419 | +[2024-09-26 14:52:07,499][Main][INFO] - [train] Step 5325 out of 20000 | Loss --> 1.982 | Grad_l2 --> 0.400 | Weights_l2 --> 10979.505 | Lr --> 0.010 | Seconds_per_step --> 6.359 | +[2024-09-26 14:54:46,462][Main][INFO] - [train] Step 5350 out of 20000 | Loss --> 1.971 | Grad_l2 --> 0.268 | Weights_l2 --> 10980.513 | Lr --> 0.010 | Seconds_per_step --> 6.358 | +[2024-09-26 14:57:25,662][Main][INFO] - [train] Step 5375 out of 20000 | Loss --> 1.979 | Grad_l2 --> 0.274 | Weights_l2 --> 10981.546 | Lr --> 0.010 | Seconds_per_step --> 6.368 | +[2024-09-26 15:00:06,546][Main][INFO] - [train] Step 5400 out of 20000 | Loss --> 1.966 | Grad_l2 --> 0.264 | Weights_l2 --> 10982.540 | Lr --> 0.010 | Seconds_per_step --> 6.435 | +[2024-09-26 15:02:45,536][Main][INFO] - [train] Step 5425 out of 20000 | Loss --> 1.973 | Grad_l2 --> 0.272 | Weights_l2 --> 10983.559 | Lr --> 0.010 | Seconds_per_step --> 6.360 | +[2024-09-26 15:05:24,772][Main][INFO] - [train] Step 5450 out of 20000 | Loss --> 1.978 | Grad_l2 --> 0.261 | Weights_l2 --> 10984.589 | Lr --> 0.010 | Seconds_per_step --> 6.369 | +[2024-09-26 15:08:03,927][Main][INFO] - [train] Step 5475 out of 20000 | Loss --> 1.970 | Grad_l2 --> 0.256 | Weights_l2 --> 10985.591 | Lr --> 0.010 | Seconds_per_step --> 6.366 | +[2024-09-26 15:10:42,953][Main][INFO] - [train] Step 5500 out of 20000 | Loss --> 1.982 | Grad_l2 --> 0.445 | Weights_l2 --> 10986.617 | Lr --> 0.010 | Seconds_per_step --> 6.361 | +[2024-09-26 15:13:23,643][Main][INFO] - [train] Step 5525 out of 20000 | Loss --> 1.971 | Grad_l2 --> 0.270 | Weights_l2 --> 10987.599 | Lr --> 0.010 | Seconds_per_step --> 6.428 | +[2024-09-26 15:16:02,577][Main][INFO] - [train] Step 5550 out of 20000 | Loss --> 1.982 | Grad_l2 --> 0.262 | Weights_l2 --> 10988.612 | Lr --> 0.010 | Seconds_per_step --> 6.357 | +[2024-09-26 15:18:41,435][Main][INFO] - [train] Step 5575 out of 20000 | Loss --> 1.970 | Grad_l2 --> 0.270 | Weights_l2 --> 10989.625 | Lr --> 0.010 | Seconds_per_step --> 6.354 | +[2024-09-26 15:21:20,344][Main][INFO] - [train] Step 5600 out of 20000 | Loss --> 1.972 | Grad_l2 --> 0.270 | Weights_l2 --> 10990.638 | Lr --> 0.010 | Seconds_per_step --> 6.356 | +[2024-09-26 15:24:00,824][Main][INFO] - [train] Step 5625 out of 20000 | Loss --> 1.978 | Grad_l2 --> 0.274 | Weights_l2 --> 10991.655 | Lr --> 0.010 | Seconds_per_step --> 6.419 | +[2024-09-26 15:26:39,629][Main][INFO] - [train] Step 5650 out of 20000 | Loss --> 1.979 | Grad_l2 --> 0.255 | Weights_l2 --> 10992.647 | Lr --> 0.010 | Seconds_per_step --> 6.352 | +[2024-09-26 15:29:18,632][Main][INFO] - [train] Step 5675 out of 20000 | Loss --> 1.976 | Grad_l2 --> 0.266 | Weights_l2 --> 10993.620 | Lr --> 0.010 | Seconds_per_step --> 6.360 | +[2024-09-26 15:31:57,572][Main][INFO] - [train] Step 5700 out of 20000 | Loss --> 1.970 | Grad_l2 --> 0.260 | Weights_l2 --> 10994.613 | Lr --> 0.010 | Seconds_per_step --> 6.358 | +[2024-09-26 15:34:38,166][Main][INFO] - [train] Step 5725 out of 20000 | Loss --> 1.986 | Grad_l2 --> 0.267 | Weights_l2 --> 10995.630 | Lr --> 0.010 | Seconds_per_step --> 6.424 | +[2024-09-26 15:37:17,198][Main][INFO] - [train] Step 5750 out of 20000 | Loss --> 1.979 | Grad_l2 --> 0.267 | Weights_l2 --> 10996.643 | Lr --> 0.010 | Seconds_per_step --> 6.361 | +[2024-09-26 15:39:56,251][Main][INFO] - [train] Step 5775 out of 20000 | Loss --> 1.987 | Grad_l2 --> 0.541 | Weights_l2 --> 10997.625 | Lr --> 0.010 | Seconds_per_step --> 6.362 | +[2024-09-26 15:42:35,223][Main][INFO] - [train] Step 5800 out of 20000 | Loss --> 1.973 | Grad_l2 --> 0.297 | Weights_l2 --> 10998.615 | Lr --> 0.010 | Seconds_per_step --> 6.359 | +[2024-09-26 15:45:15,847][Main][INFO] - [train] Step 5825 out of 20000 | Loss --> 1.968 | Grad_l2 --> 0.257 | Weights_l2 --> 10999.598 | Lr --> 0.010 | Seconds_per_step --> 6.425 | +[2024-09-26 15:47:54,621][Main][INFO] - [train] Step 5850 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.261 | Weights_l2 --> 11000.634 | Lr --> 0.010 | Seconds_per_step --> 6.351 | +[2024-09-26 15:50:33,494][Main][INFO] - [train] Step 5875 out of 20000 | Loss --> 1.972 | Grad_l2 --> 0.268 | Weights_l2 --> 11001.642 | Lr --> 0.010 | Seconds_per_step --> 6.355 | +[2024-09-26 15:53:12,499][Main][INFO] - [train] Step 5900 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.266 | Weights_l2 --> 11002.642 | Lr --> 0.010 | Seconds_per_step --> 6.360 | +[2024-09-26 15:55:51,406][Main][INFO] - [train] Step 5925 out of 20000 | Loss --> 1.991 | Grad_l2 --> 0.273 | Weights_l2 --> 11003.648 | Lr --> 0.010 | Seconds_per_step --> 6.356 | +[2024-09-26 15:58:31,873][Main][INFO] - [train] Step 5950 out of 20000 | Loss --> 1.968 | Grad_l2 --> 0.260 | Weights_l2 --> 11004.664 | Lr --> 0.010 | Seconds_per_step --> 6.419 | +[2024-09-26 16:01:11,199][Main][INFO] - [train] Step 5975 out of 20000 | Loss --> 1.973 | Grad_l2 --> 0.249 | Weights_l2 --> 11005.655 | Lr --> 0.010 | Seconds_per_step --> 6.373 | +[2024-09-26 16:03:50,242][Main][INFO] - [train] Step 6000 out of 20000 | Loss --> 1.975 | Grad_l2 --> 0.273 | Weights_l2 --> 11006.645 | Lr --> 0.010 | Seconds_per_step --> 6.362 | +[2024-09-26 16:06:29,429][Main][INFO] - [train] Step 6025 out of 20000 | Loss --> 1.989 | Grad_l2 --> 0.274 | Weights_l2 --> 11007.650 | Lr --> 0.010 | Seconds_per_step --> 6.367 | +[2024-09-26 16:09:09,951][Main][INFO] - [train] Step 6050 out of 20000 | Loss --> 1.975 | Grad_l2 --> 0.293 | Weights_l2 --> 11008.629 | Lr --> 0.010 | Seconds_per_step --> 6.421 | +[2024-09-26 16:11:48,875][Main][INFO] - [train] Step 6075 out of 20000 | Loss --> 1.974 | Grad_l2 --> 0.268 | Weights_l2 --> 11009.618 | Lr --> 0.010 | Seconds_per_step --> 6.357 | +[2024-09-26 16:14:27,876][Main][INFO] - [train] Step 6100 out of 20000 | Loss --> 1.970 | Grad_l2 --> 0.261 | Weights_l2 --> 11010.633 | Lr --> 0.010 | Seconds_per_step --> 6.360 | +[2024-09-26 16:17:06,972][Main][INFO] - [train] Step 6125 out of 20000 | Loss --> 1.979 | Grad_l2 --> 0.271 | Weights_l2 --> 11011.623 | Lr --> 0.010 | Seconds_per_step --> 6.364 | +[2024-09-26 16:19:47,815][Main][INFO] - [train] Step 6150 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.287 | Weights_l2 --> 11012.611 | Lr --> 0.010 | Seconds_per_step --> 6.434 | +[2024-09-26 16:22:26,834][Main][INFO] - [train] Step 6175 out of 20000 | Loss --> 1.983 | Grad_l2 --> 0.293 | Weights_l2 --> 11013.622 | Lr --> 0.010 | Seconds_per_step --> 6.361 | +[2024-09-26 16:25:05,681][Main][INFO] - [train] Step 6200 out of 20000 | Loss --> 1.981 | Grad_l2 --> 0.285 | Weights_l2 --> 11014.612 | Lr --> 0.010 | Seconds_per_step --> 6.354 | +[2024-09-26 16:27:44,610][Main][INFO] - [train] Step 6225 out of 20000 | Loss --> 1.969 | Grad_l2 --> 0.256 | Weights_l2 --> 11015.591 | Lr --> 0.010 | Seconds_per_step --> 6.357 | +[2024-09-26 16:30:25,108][Main][INFO] - [train] Step 6250 out of 20000 | Loss --> 1.979 | Grad_l2 --> 0.256 | Weights_l2 --> 11016.560 | Lr --> 0.010 | Seconds_per_step --> 6.420 | +[2024-09-26 16:33:03,962][Main][INFO] - [train] Step 6275 out of 20000 | Loss --> 1.974 | Grad_l2 --> 0.262 | Weights_l2 --> 11017.562 | Lr --> 0.010 | Seconds_per_step --> 6.354 | +[2024-09-26 16:35:42,787][Main][INFO] - [train] Step 6300 out of 20000 | Loss --> 1.984 | Grad_l2 --> 0.277 | Weights_l2 --> 11018.566 | Lr --> 0.010 | Seconds_per_step --> 6.353 | +[2024-09-26 16:38:21,945][Main][INFO] - [train] Step 6325 out of 20000 | Loss --> 1.967 | Grad_l2 --> 0.398 | Weights_l2 --> 11019.527 | Lr --> 0.010 | Seconds_per_step --> 6.366 | +[2024-09-26 16:41:01,324][Main][INFO] - [train] Step 6350 out of 20000 | Loss --> 1.974 | Grad_l2 --> 0.248 | Weights_l2 --> 11020.504 | Lr --> 0.010 | Seconds_per_step --> 6.375 | +[2024-09-26 16:43:42,138][Main][INFO] - [train] Step 6375 out of 20000 | Loss --> 1.972 | Grad_l2 --> 0.267 | Weights_l2 --> 11021.468 | Lr --> 0.010 | Seconds_per_step --> 6.432 | +[2024-09-26 16:46:21,048][Main][INFO] - [train] Step 6400 out of 20000 | Loss --> 1.973 | Grad_l2 --> 0.254 | Weights_l2 --> 11022.446 | Lr --> 0.010 | Seconds_per_step --> 6.356 | +[2024-09-26 16:49:00,075][Main][INFO] - [train] Step 6425 out of 20000 | Loss --> 1.977 | Grad_l2 --> 0.252 | Weights_l2 --> 11023.425 | Lr --> 0.010 | Seconds_per_step --> 6.361 | +[2024-09-26 16:51:39,392][Main][INFO] - [train] Step 6450 out of 20000 | Loss --> 1.958 | Grad_l2 --> 0.256 | Weights_l2 --> 11024.378 | Lr --> 0.010 | Seconds_per_step --> 6.373 | +[2024-09-26 16:54:20,541][Main][INFO] - [train] Step 6475 out of 20000 | Loss --> 1.968 | Grad_l2 --> 0.258 | Weights_l2 --> 11025.353 | Lr --> 0.010 | Seconds_per_step --> 6.446 | +[2024-09-26 16:56:59,698][Main][INFO] - [train] Step 6500 out of 20000 | Loss --> 1.971 | Grad_l2 --> 0.339 | Weights_l2 --> 11026.310 | Lr --> 0.010 | Seconds_per_step --> 6.366 | +[2024-09-26 16:59:39,191][Main][INFO] - [train] Step 6525 out of 20000 | Loss --> 1.973 | Grad_l2 --> 0.259 | Weights_l2 --> 11027.288 | Lr --> 0.010 | Seconds_per_step --> 6.380 | +[2024-09-26 17:02:18,573][Main][INFO] - [train] Step 6550 out of 20000 | Loss --> 1.961 | Grad_l2 --> 0.265 | Weights_l2 --> 11028.271 | Lr --> 0.010 | Seconds_per_step --> 6.375 | +[2024-09-26 17:04:59,754][Main][INFO] - [train] Step 6575 out of 20000 | Loss --> 1.968 | Grad_l2 --> 0.253 | Weights_l2 --> 11029.238 | Lr --> 0.010 | Seconds_per_step --> 6.447 | +[2024-09-26 17:07:38,990][Main][INFO] - [train] Step 6600 out of 20000 | Loss --> 1.975 | Grad_l2 --> 0.249 | Weights_l2 --> 11030.206 | Lr --> 0.010 | Seconds_per_step --> 6.369 | +[2024-09-26 17:10:17,893][Main][INFO] - [train] Step 6625 out of 20000 | Loss --> 1.968 | Grad_l2 --> 0.254 | Weights_l2 --> 11031.163 | Lr --> 0.010 | Seconds_per_step --> 6.356 | +[2024-09-26 17:12:56,927][Main][INFO] - [train] Step 6650 out of 20000 | Loss --> 1.979 | Grad_l2 --> 0.324 | Weights_l2 --> 11032.122 | Lr --> 0.010 | Seconds_per_step --> 6.361 | +[2024-09-26 17:15:36,171][Main][INFO] - [train] Step 6675 out of 20000 | Loss --> 1.963 | Grad_l2 --> 0.506 | Weights_l2 --> 11033.059 | Lr --> 0.010 | Seconds_per_step --> 6.370 | +[2024-09-26 17:18:16,981][Main][INFO] - [train] Step 6700 out of 20000 | Loss --> 1.961 | Grad_l2 --> 0.269 | Weights_l2 --> 11033.983 | Lr --> 0.010 | Seconds_per_step --> 6.432 | +[2024-09-26 17:20:56,295][Main][INFO] - [train] Step 6725 out of 20000 | Loss --> 1.967 | Grad_l2 --> 0.255 | Weights_l2 --> 11034.898 | Lr --> 0.010 | Seconds_per_step --> 6.372 | +[2024-09-26 17:23:35,730][Main][INFO] - [train] Step 6750 out of 20000 | Loss --> 1.956 | Grad_l2 --> 0.250 | Weights_l2 --> 11035.854 | Lr --> 0.010 | Seconds_per_step --> 6.377 | +[2024-09-26 17:26:14,887][Main][INFO] - [train] Step 6775 out of 20000 | Loss --> 1.964 | Grad_l2 --> 0.264 | Weights_l2 --> 11036.789 | Lr --> 0.010 | Seconds_per_step --> 6.366 | +[2024-09-26 17:28:55,677][Main][INFO] - [train] Step 6800 out of 20000 | Loss --> 1.967 | Grad_l2 --> 0.259 | Weights_l2 --> 11037.759 | Lr --> 0.010 | Seconds_per_step --> 6.431 | +[2024-09-26 17:31:34,582][Main][INFO] - [train] Step 6825 out of 20000 | Loss --> 1.970 | Grad_l2 --> 0.252 | Weights_l2 --> 11038.698 | Lr --> 0.010 | Seconds_per_step --> 6.356 | +[2024-09-26 17:34:13,737][Main][INFO] - [train] Step 6850 out of 20000 | Loss --> 1.959 | Grad_l2 --> 0.246 | Weights_l2 --> 11039.653 | Lr --> 0.010 | Seconds_per_step --> 6.366 | +[2024-09-26 17:36:53,071][Main][INFO] - [train] Step 6875 out of 20000 | Loss --> 1.959 | Grad_l2 --> 0.249 | Weights_l2 --> 11040.561 | Lr --> 0.010 | Seconds_per_step --> 6.373 | +[2024-09-26 17:39:34,348][Main][INFO] - [train] Step 6900 out of 20000 | Loss --> 1.970 | Grad_l2 --> 0.256 | Weights_l2 --> 11041.521 | Lr --> 0.010 | Seconds_per_step --> 6.451 | +[2024-09-26 17:42:13,864][Main][INFO] - [train] Step 6925 out of 20000 | Loss --> 1.964 | Grad_l2 --> 0.252 | Weights_l2 --> 11042.448 | Lr --> 0.010 | Seconds_per_step --> 6.381 | +[2024-09-26 17:44:53,119][Main][INFO] - [train] Step 6950 out of 20000 | Loss --> 1.963 | Grad_l2 --> 0.443 | Weights_l2 --> 11043.374 | Lr --> 0.010 | Seconds_per_step --> 6.370 | +[2024-09-26 17:47:32,194][Main][INFO] - [train] Step 6975 out of 20000 | Loss --> 1.966 | Grad_l2 --> 0.377 | Weights_l2 --> 11044.286 | Lr --> 0.010 | Seconds_per_step --> 6.363 | +[2024-09-26 17:50:12,775][Main][INFO] - [train] Step 7000 out of 20000 | Loss --> 1.951 | Grad_l2 --> 0.258 | Weights_l2 --> 11045.216 | Lr --> 0.010 | Seconds_per_step --> 6.423 | +[2024-09-26 17:52:51,842][Main][INFO] - [train] Step 7025 out of 20000 | Loss --> 1.967 | Grad_l2 --> 0.247 | Weights_l2 --> 11046.151 | Lr --> 0.010 | Seconds_per_step --> 6.363 | +[2024-09-26 17:55:30,968][Main][INFO] - [train] Step 7050 out of 20000 | Loss --> 1.967 | Grad_l2 --> 0.249 | Weights_l2 --> 11047.072 | Lr --> 0.010 | Seconds_per_step --> 6.365 | +[2024-09-26 17:58:10,089][Main][INFO] - [train] Step 7075 out of 20000 | Loss --> 1.960 | Grad_l2 --> 0.245 | Weights_l2 --> 11048.003 | Lr --> 0.010 | Seconds_per_step --> 6.365 | +[2024-09-26 18:00:50,812][Main][INFO] - [train] Step 7100 out of 20000 | Loss --> 1.964 | Grad_l2 --> 0.249 | Weights_l2 --> 11048.927 | Lr --> 0.010 | Seconds_per_step --> 6.429 | +[2024-09-26 18:03:29,978][Main][INFO] - [train] Step 7125 out of 20000 | Loss --> 1.960 | Grad_l2 --> 0.257 | Weights_l2 --> 11049.849 | Lr --> 0.010 | Seconds_per_step --> 6.367 | +[2024-09-26 18:06:09,252][Main][INFO] - [train] Step 7150 out of 20000 | Loss --> 1.959 | Grad_l2 --> 0.251 | Weights_l2 --> 11050.757 | Lr --> 0.010 | Seconds_per_step --> 6.371 | +[2024-09-26 18:08:48,377][Main][INFO] - [train] Step 7175 out of 20000 | Loss --> 1.958 | Grad_l2 --> 0.252 | Weights_l2 --> 11051.654 | Lr --> 0.009 | Seconds_per_step --> 6.365 | +[2024-09-26 18:11:27,262][Main][INFO] - [train] Step 7200 out of 20000 | Loss --> 1.957 | Grad_l2 --> 0.252 | Weights_l2 --> 11052.578 | Lr --> 0.009 | Seconds_per_step --> 6.355 | +[2024-09-26 18:14:07,580][Main][INFO] - [train] Step 7225 out of 20000 | Loss --> 1.960 | Grad_l2 --> 0.249 | Weights_l2 --> 11053.501 | Lr --> 0.009 | Seconds_per_step --> 6.413 | +[2024-09-26 18:16:46,110][Main][INFO] - [train] Step 7250 out of 20000 | Loss --> 1.953 | Grad_l2 --> 0.254 | Weights_l2 --> 11054.420 | Lr --> 0.009 | Seconds_per_step --> 6.341 | +[2024-09-26 18:19:24,544][Main][INFO] - [train] Step 7275 out of 20000 | Loss --> 1.952 | Grad_l2 --> 0.251 | Weights_l2 --> 11055.324 | Lr --> 0.009 | Seconds_per_step --> 6.337 | +[2024-09-26 18:22:02,971][Main][INFO] - [train] Step 7300 out of 20000 | Loss --> 1.951 | Grad_l2 --> 0.279 | Weights_l2 --> 11056.209 | Lr --> 0.009 | Seconds_per_step --> 6.337 | +[2024-09-26 18:24:43,108][Main][INFO] - [train] Step 7325 out of 20000 | Loss --> 1.951 | Grad_l2 --> 0.484 | Weights_l2 --> 11057.090 | Lr --> 0.009 | Seconds_per_step --> 6.405 | +[2024-09-26 18:27:21,678][Main][INFO] - [train] Step 7350 out of 20000 | Loss --> 1.951 | Grad_l2 --> 0.256 | Weights_l2 --> 11057.990 | Lr --> 0.009 | Seconds_per_step --> 6.343 | +[2024-09-26 18:30:00,175][Main][INFO] - [train] Step 7375 out of 20000 | Loss --> 1.949 | Grad_l2 --> 0.248 | Weights_l2 --> 11058.875 | Lr --> 0.009 | Seconds_per_step --> 6.340 | +[2024-09-26 18:32:38,855][Main][INFO] - [train] Step 7400 out of 20000 | Loss --> 1.951 | Grad_l2 --> 0.261 | Weights_l2 --> 11059.781 | Lr --> 0.009 | Seconds_per_step --> 6.347 | +[2024-09-26 18:35:19,579][Main][INFO] - [train] Step 7425 out of 20000 | Loss --> 1.949 | Grad_l2 --> 0.260 | Weights_l2 --> 11060.662 | Lr --> 0.009 | Seconds_per_step --> 6.429 | +[2024-09-26 18:37:57,854][Main][INFO] - [train] Step 7450 out of 20000 | Loss --> 1.965 | Grad_l2 --> 0.244 | Weights_l2 --> 11061.538 | Lr --> 0.009 | Seconds_per_step --> 6.331 | +[2024-09-26 18:40:36,537][Main][INFO] - [train] Step 7475 out of 20000 | Loss --> 1.962 | Grad_l2 --> 0.258 | Weights_l2 --> 11062.401 | Lr --> 0.009 | Seconds_per_step --> 6.347 | +[2024-09-26 18:43:15,092][Main][INFO] - [train] Step 7500 out of 20000 | Loss --> 1.957 | Grad_l2 --> 0.252 | Weights_l2 --> 11063.299 | Lr --> 0.009 | Seconds_per_step --> 6.342 | +[2024-09-26 18:43:15,093][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-7500 +[2024-09-26 18:43:15,100][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading +[2024-09-26 18:43:23,242][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-7500/model.safetensors +[2024-09-26 18:43:33,331][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-7500/optimizer.bin +[2024-09-26 18:43:33,333][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-7500/scheduler.bin +[2024-09-26 18:43:33,333][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-7500/sampler.bin +[2024-09-26 18:43:33,333][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-7500/sampler_1.bin +[2024-09-26 18:43:33,334][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-7500/random_states_0.pkl +[2024-09-26 18:46:12,969][Main][INFO] - [train] Step 7525 out of 20000 | Loss --> 1.957 | Grad_l2 --> 0.255 | Weights_l2 --> 11064.180 | Lr --> 0.009 | Seconds_per_step --> 7.115 | +[2024-09-26 18:48:51,068][Main][INFO] - [train] Step 7550 out of 20000 | Loss --> 1.957 | Grad_l2 --> 0.252 | Weights_l2 --> 11065.072 | Lr --> 0.009 | Seconds_per_step --> 6.324 | +[2024-09-26 18:51:29,268][Main][INFO] - [train] Step 7575 out of 20000 | Loss --> 1.938 | Grad_l2 --> 0.256 | Weights_l2 --> 11065.939 | Lr --> 0.009 | Seconds_per_step --> 6.328 | +[2024-09-26 18:54:07,545][Main][INFO] - [train] Step 7600 out of 20000 | Loss --> 1.949 | Grad_l2 --> 0.245 | Weights_l2 --> 11066.801 | Lr --> 0.009 | Seconds_per_step --> 6.331 | +[2024-09-26 18:56:45,833][Main][INFO] - [train] Step 7625 out of 20000 | Loss --> 1.954 | Grad_l2 --> 0.239 | Weights_l2 --> 11067.666 | Lr --> 0.009 | Seconds_per_step --> 6.331 | +[2024-09-26 18:59:25,630][Main][INFO] - [train] Step 7650 out of 20000 | Loss --> 1.952 | Grad_l2 --> 0.247 | Weights_l2 --> 11068.523 | Lr --> 0.009 | Seconds_per_step --> 6.392 | +[2024-09-26 19:02:04,057][Main][INFO] - [train] Step 7675 out of 20000 | Loss --> 1.958 | Grad_l2 --> 0.264 | Weights_l2 --> 11069.385 | Lr --> 0.009 | Seconds_per_step --> 6.337 | +[2024-09-26 19:04:42,592][Main][INFO] - [train] Step 7700 out of 20000 | Loss --> 1.948 | Grad_l2 --> 0.247 | Weights_l2 --> 11070.257 | Lr --> 0.009 | Seconds_per_step --> 6.341 | +[2024-09-26 19:07:21,286][Main][INFO] - [train] Step 7725 out of 20000 | Loss --> 1.947 | Grad_l2 --> 0.251 | Weights_l2 --> 11071.108 | Lr --> 0.009 | Seconds_per_step --> 6.348 | +[2024-09-26 19:10:01,383][Main][INFO] - [train] Step 7750 out of 20000 | Loss --> 1.943 | Grad_l2 --> 0.254 | Weights_l2 --> 11071.971 | Lr --> 0.009 | Seconds_per_step --> 6.404 | +[2024-09-26 19:12:39,962][Main][INFO] - [train] Step 7775 out of 20000 | Loss --> 1.943 | Grad_l2 --> 0.278 | Weights_l2 --> 11072.846 | Lr --> 0.009 | Seconds_per_step --> 6.343 | +[2024-09-26 19:15:18,593][Main][INFO] - [train] Step 7800 out of 20000 | Loss --> 1.943 | Grad_l2 --> 0.254 | Weights_l2 --> 11073.694 | Lr --> 0.009 | Seconds_per_step --> 6.345 | +[2024-09-26 19:17:56,997][Main][INFO] - [train] Step 7825 out of 20000 | Loss --> 1.955 | Grad_l2 --> 0.265 | Weights_l2 --> 11074.541 | Lr --> 0.009 | Seconds_per_step --> 6.336 | +[2024-09-26 19:20:38,133][Main][INFO] - [train] Step 7850 out of 20000 | Loss --> 1.952 | Grad_l2 --> 0.249 | Weights_l2 --> 11075.391 | Lr --> 0.009 | Seconds_per_step --> 6.445 | +[2024-09-26 19:23:16,830][Main][INFO] - [train] Step 7875 out of 20000 | Loss --> 1.935 | Grad_l2 --> 0.266 | Weights_l2 --> 11076.212 | Lr --> 0.009 | Seconds_per_step --> 6.348 | +[2024-09-26 19:25:55,647][Main][INFO] - [train] Step 7900 out of 20000 | Loss --> 1.939 | Grad_l2 --> 0.261 | Weights_l2 --> 11077.043 | Lr --> 0.009 | Seconds_per_step --> 6.353 | +[2024-09-26 19:28:34,700][Main][INFO] - [train] Step 7925 out of 20000 | Loss --> 1.928 | Grad_l2 --> 0.362 | Weights_l2 --> 11077.890 | Lr --> 0.009 | Seconds_per_step --> 6.362 | +[2024-09-26 19:31:15,322][Main][INFO] - [train] Step 7950 out of 20000 | Loss --> 1.936 | Grad_l2 --> 0.245 | Weights_l2 --> 11078.730 | Lr --> 0.009 | Seconds_per_step --> 6.425 | +[2024-09-26 19:33:54,119][Main][INFO] - [train] Step 7975 out of 20000 | Loss --> 1.938 | Grad_l2 --> 0.241 | Weights_l2 --> 11079.560 | Lr --> 0.009 | Seconds_per_step --> 6.352 | +[2024-09-26 19:36:32,556][Main][INFO] - [train] Step 8000 out of 20000 | Loss --> 1.945 | Grad_l2 --> 0.249 | Weights_l2 --> 11080.389 | Lr --> 0.009 | Seconds_per_step --> 6.337 | +[2024-09-26 19:39:11,017][Main][INFO] - [train] Step 8025 out of 20000 | Loss --> 1.933 | Grad_l2 --> 0.244 | Weights_l2 --> 11081.217 | Lr --> 0.009 | Seconds_per_step --> 6.338 | +[2024-09-26 19:41:49,619][Main][INFO] - [train] Step 8050 out of 20000 | Loss --> 1.934 | Grad_l2 --> 0.237 | Weights_l2 --> 11082.035 | Lr --> 0.009 | Seconds_per_step --> 6.344 | +[2024-09-26 19:44:29,746][Main][INFO] - [train] Step 8075 out of 20000 | Loss --> 1.942 | Grad_l2 --> 0.240 | Weights_l2 --> 11082.867 | Lr --> 0.009 | Seconds_per_step --> 6.405 | +[2024-09-26 19:47:08,224][Main][INFO] - [train] Step 8100 out of 20000 | Loss --> 1.920 | Grad_l2 --> 0.237 | Weights_l2 --> 11083.671 | Lr --> 0.009 | Seconds_per_step --> 6.339 | +[2024-09-26 19:49:46,702][Main][INFO] - [train] Step 8125 out of 20000 | Loss --> 1.933 | Grad_l2 --> 0.252 | Weights_l2 --> 11084.484 | Lr --> 0.009 | Seconds_per_step --> 6.339 | +[2024-09-26 19:52:25,287][Main][INFO] - [train] Step 8150 out of 20000 | Loss --> 1.925 | Grad_l2 --> 0.246 | Weights_l2 --> 11085.299 | Lr --> 0.009 | Seconds_per_step --> 6.343 | +[2024-09-26 19:55:05,855][Main][INFO] - [train] Step 8175 out of 20000 | Loss --> 1.928 | Grad_l2 --> 0.238 | Weights_l2 --> 11086.109 | Lr --> 0.009 | Seconds_per_step --> 6.423 | +[2024-09-26 19:57:45,002][Main][INFO] - [train] Step 8200 out of 20000 | Loss --> 1.931 | Grad_l2 --> 0.251 | Weights_l2 --> 11086.899 | Lr --> 0.009 | Seconds_per_step --> 6.366 | +[2024-09-26 20:00:23,952][Main][INFO] - [train] Step 8225 out of 20000 | Loss --> 1.930 | Grad_l2 --> 0.246 | Weights_l2 --> 11087.696 | Lr --> 0.009 | Seconds_per_step --> 6.358 | +[2024-09-26 20:03:02,535][Main][INFO] - [train] Step 8250 out of 20000 | Loss --> 1.937 | Grad_l2 --> 0.258 | Weights_l2 --> 11088.502 | Lr --> 0.009 | Seconds_per_step --> 6.343 | +[2024-09-26 20:05:43,181][Main][INFO] - [train] Step 8275 out of 20000 | Loss --> 1.926 | Grad_l2 --> 0.263 | Weights_l2 --> 11089.294 | Lr --> 0.009 | Seconds_per_step --> 6.426 | +[2024-09-26 20:08:22,554][Main][INFO] - [train] Step 8300 out of 20000 | Loss --> 1.919 | Grad_l2 --> 0.249 | Weights_l2 --> 11090.091 | Lr --> 0.009 | Seconds_per_step --> 6.375 | +[2024-09-26 20:10:14,326][huggingface_hub.utils._http][WARNING] - '(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: f05a1204-4970-4176-8025-cfe737122ff3)')' thrown while requesting GET https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus/resolve/3ba9d605774198c5868892d7a8deda78031a781f/fineweb-edu-dedup/train-00193-of-00234.parquet +[2024-09-26 20:10:14,327][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. +[2024-09-26 20:11:01,516][Main][INFO] - [train] Step 8325 out of 20000 | Loss --> 1.929 | Grad_l2 --> 0.258 | Weights_l2 --> 11090.911 | Lr --> 0.009 | Seconds_per_step --> 6.358 | +[2024-09-26 20:13:40,094][Main][INFO] - [train] Step 8350 out of 20000 | Loss --> 1.919 | Grad_l2 --> 0.254 | Weights_l2 --> 11091.686 | Lr --> 0.009 | Seconds_per_step --> 6.343 | +[2024-09-26 20:16:20,119][Main][INFO] - [train] Step 8375 out of 20000 | Loss --> 1.943 | Grad_l2 --> 0.240 | Weights_l2 --> 11092.475 | Lr --> 0.009 | Seconds_per_step --> 6.401 | +[2024-09-26 20:18:58,577][Main][INFO] - [train] Step 8400 out of 20000 | Loss --> 1.927 | Grad_l2 --> 0.297 | Weights_l2 --> 11093.224 | Lr --> 0.009 | Seconds_per_step --> 6.338 | +[2024-09-26 20:21:37,112][Main][INFO] - [train] Step 8425 out of 20000 | Loss --> 1.929 | Grad_l2 --> 0.475 | Weights_l2 --> 11093.968 | Lr --> 0.009 | Seconds_per_step --> 6.341 | +[2024-09-26 20:24:15,586][Main][INFO] - [train] Step 8450 out of 20000 | Loss --> 1.907 | Grad_l2 --> 0.243 | Weights_l2 --> 11094.737 | Lr --> 0.009 | Seconds_per_step --> 6.339 | +[2024-09-26 20:26:55,496][Main][INFO] - [train] Step 8475 out of 20000 | Loss --> 1.918 | Grad_l2 --> 0.236 | Weights_l2 --> 11095.491 | Lr --> 0.009 | Seconds_per_step --> 6.396 | +[2024-09-26 20:29:34,086][Main][INFO] - [train] Step 8500 out of 20000 | Loss --> 1.915 | Grad_l2 --> 0.240 | Weights_l2 --> 11096.255 | Lr --> 0.009 | Seconds_per_step --> 6.344 | +[2024-09-26 20:32:12,765][Main][INFO] - [train] Step 8525 out of 20000 | Loss --> 1.914 | Grad_l2 --> 0.235 | Weights_l2 --> 11097.010 | Lr --> 0.009 | Seconds_per_step --> 6.347 | +[2024-09-26 20:34:51,548][Main][INFO] - [train] Step 8550 out of 20000 | Loss --> 1.921 | Grad_l2 --> 0.243 | Weights_l2 --> 11097.768 | Lr --> 0.009 | Seconds_per_step --> 6.351 | +[2024-09-26 20:37:30,317][Main][INFO] - [train] Step 8575 out of 20000 | Loss --> 1.912 | Grad_l2 --> 0.261 | Weights_l2 --> 11098.524 | Lr --> 0.009 | Seconds_per_step --> 6.351 | +[2024-09-26 20:40:11,030][Main][INFO] - [train] Step 8600 out of 20000 | Loss --> 1.914 | Grad_l2 --> 0.246 | Weights_l2 --> 11099.279 | Lr --> 0.009 | Seconds_per_step --> 6.428 | +[2024-09-26 20:42:49,713][Main][INFO] - [train] Step 8625 out of 20000 | Loss --> 1.901 | Grad_l2 --> 0.239 | Weights_l2 --> 11100.026 | Lr --> 0.009 | Seconds_per_step --> 6.347 | +[2024-09-26 20:45:28,307][Main][INFO] - [train] Step 8650 out of 20000 | Loss --> 1.910 | Grad_l2 --> 0.246 | Weights_l2 --> 11100.763 | Lr --> 0.009 | Seconds_per_step --> 6.344 | +[2024-09-26 20:48:06,950][Main][INFO] - [train] Step 8675 out of 20000 | Loss --> 1.911 | Grad_l2 --> 0.247 | Weights_l2 --> 11101.515 | Lr --> 0.009 | Seconds_per_step --> 6.346 | +[2024-09-26 20:50:47,374][Main][INFO] - [train] Step 8700 out of 20000 | Loss --> 1.908 | Grad_l2 --> 0.287 | Weights_l2 --> 11102.272 | Lr --> 0.009 | Seconds_per_step --> 6.417 | +[2024-09-26 20:53:26,098][Main][INFO] - [train] Step 8725 out of 20000 | Loss --> 1.905 | Grad_l2 --> 0.576 | Weights_l2 --> 11102.972 | Lr --> 0.009 | Seconds_per_step --> 6.349 | +[2024-09-26 20:56:04,790][Main][INFO] - [train] Step 8750 out of 20000 | Loss --> 1.897 | Grad_l2 --> 0.233 | Weights_l2 --> 11103.689 | Lr --> 0.009 | Seconds_per_step --> 6.348 | +[2024-09-26 20:58:43,474][Main][INFO] - [train] Step 8775 out of 20000 | Loss --> 1.906 | Grad_l2 --> 0.233 | Weights_l2 --> 11104.413 | Lr --> 0.009 | Seconds_per_step --> 6.347 | +[2024-09-26 21:01:23,604][Main][INFO] - [train] Step 8800 out of 20000 | Loss --> 1.905 | Grad_l2 --> 0.233 | Weights_l2 --> 11105.133 | Lr --> 0.009 | Seconds_per_step --> 6.405 | +[2024-09-26 21:04:02,270][Main][INFO] - [train] Step 8825 out of 20000 | Loss --> 1.907 | Grad_l2 --> 0.242 | Weights_l2 --> 11105.869 | Lr --> 0.008 | Seconds_per_step --> 6.347 | +[2024-09-26 21:06:40,999][Main][INFO] - [train] Step 8850 out of 20000 | Loss --> 1.897 | Grad_l2 --> 0.232 | Weights_l2 --> 11106.597 | Lr --> 0.008 | Seconds_per_step --> 6.349 | +[2024-09-26 21:09:19,786][Main][INFO] - [train] Step 8875 out of 20000 | Loss --> 1.902 | Grad_l2 --> 0.234 | Weights_l2 --> 11107.286 | Lr --> 0.008 | Seconds_per_step --> 6.351 | +[2024-09-26 21:11:58,554][Main][INFO] - [train] Step 8900 out of 20000 | Loss --> 1.898 | Grad_l2 --> 0.237 | Weights_l2 --> 11108.006 | Lr --> 0.008 | Seconds_per_step --> 6.351 | +[2024-09-26 21:14:38,738][Main][INFO] - [train] Step 8925 out of 20000 | Loss --> 1.910 | Grad_l2 --> 0.238 | Weights_l2 --> 11108.724 | Lr --> 0.008 | Seconds_per_step --> 6.407 | +[2024-09-26 21:17:17,432][Main][INFO] - [train] Step 8950 out of 20000 | Loss --> 1.907 | Grad_l2 --> 0.233 | Weights_l2 --> 11109.421 | Lr --> 0.008 | Seconds_per_step --> 6.348 | +[2024-09-26 21:19:56,477][Main][INFO] - [train] Step 8975 out of 20000 | Loss --> 1.890 | Grad_l2 --> 0.232 | Weights_l2 --> 11110.141 | Lr --> 0.008 | Seconds_per_step --> 6.362 | +[2024-09-26 21:22:35,356][Main][INFO] - [train] Step 9000 out of 20000 | Loss --> 1.886 | Grad_l2 --> 0.235 | Weights_l2 --> 11110.843 | Lr --> 0.008 | Seconds_per_step --> 6.355 | +[2024-09-26 21:25:15,458][Main][INFO] - [train] Step 9025 out of 20000 | Loss --> 1.910 | Grad_l2 --> 0.236 | Weights_l2 --> 11111.542 | Lr --> 0.008 | Seconds_per_step --> 6.404 | +[2024-09-26 21:27:54,302][Main][INFO] - [train] Step 9050 out of 20000 | Loss --> 1.899 | Grad_l2 --> 0.241 | Weights_l2 --> 11112.235 | Lr --> 0.008 | Seconds_per_step --> 6.354 | +[2024-09-26 21:30:33,245][Main][INFO] - [train] Step 9075 out of 20000 | Loss --> 1.882 | Grad_l2 --> 0.239 | Weights_l2 --> 11112.925 | Lr --> 0.008 | Seconds_per_step --> 6.358 | +[2024-09-26 21:33:11,941][Main][INFO] - [train] Step 9100 out of 20000 | Loss --> 1.893 | Grad_l2 --> 0.279 | Weights_l2 --> 11113.605 | Lr --> 0.008 | Seconds_per_step --> 6.348 | +[2024-09-26 21:35:52,223][Main][INFO] - [train] Step 9125 out of 20000 | Loss --> 1.885 | Grad_l2 --> 0.232 | Weights_l2 --> 11114.290 | Lr --> 0.008 | Seconds_per_step --> 6.411 | +[2024-09-26 21:38:31,106][Main][INFO] - [train] Step 9150 out of 20000 | Loss --> 1.891 | Grad_l2 --> 0.237 | Weights_l2 --> 11114.981 | Lr --> 0.008 | Seconds_per_step --> 6.355 | +[2024-09-26 21:41:09,893][Main][INFO] - [train] Step 9175 out of 20000 | Loss --> 1.882 | Grad_l2 --> 0.237 | Weights_l2 --> 11115.665 | Lr --> 0.008 | Seconds_per_step --> 6.351 | +[2024-09-26 21:43:48,620][Main][INFO] - [train] Step 9200 out of 20000 | Loss --> 1.882 | Grad_l2 --> 0.236 | Weights_l2 --> 11116.324 | Lr --> 0.008 | Seconds_per_step --> 6.349 | +[2024-09-26 21:46:29,905][Main][INFO] - [train] Step 9225 out of 20000 | Loss --> 1.889 | Grad_l2 --> 0.240 | Weights_l2 --> 11116.990 | Lr --> 0.008 | Seconds_per_step --> 6.451 | +[2024-09-26 21:49:08,771][Main][INFO] - [train] Step 9250 out of 20000 | Loss --> 1.889 | Grad_l2 --> 0.236 | Weights_l2 --> 11117.663 | Lr --> 0.008 | Seconds_per_step --> 6.355 | +[2024-09-26 21:51:47,479][Main][INFO] - [train] Step 9275 out of 20000 | Loss --> 1.886 | Grad_l2 --> 0.239 | Weights_l2 --> 11118.341 | Lr --> 0.008 | Seconds_per_step --> 6.348 | +[2024-09-26 21:54:26,344][Main][INFO] - [train] Step 9300 out of 20000 | Loss --> 1.878 | Grad_l2 --> 0.237 | Weights_l2 --> 11118.979 | Lr --> 0.008 | Seconds_per_step --> 6.355 | +[2024-09-26 21:57:05,124][Main][INFO] - [train] Step 9325 out of 20000 | Loss --> 1.892 | Grad_l2 --> 0.251 | Weights_l2 --> 11119.636 | Lr --> 0.008 | Seconds_per_step --> 6.351 | +[2024-09-26 21:59:45,684][Main][INFO] - [train] Step 9350 out of 20000 | Loss --> 1.876 | Grad_l2 --> 0.237 | Weights_l2 --> 11120.285 | Lr --> 0.008 | Seconds_per_step --> 6.422 | +[2024-09-26 22:02:25,001][Main][INFO] - [train] Step 9375 out of 20000 | Loss --> 1.895 | Grad_l2 --> 0.232 | Weights_l2 --> 11120.937 | Lr --> 0.008 | Seconds_per_step --> 6.373 | +[2024-09-26 22:05:03,816][Main][INFO] - [train] Step 9400 out of 20000 | Loss --> 1.877 | Grad_l2 --> 0.241 | Weights_l2 --> 11121.590 | Lr --> 0.008 | Seconds_per_step --> 6.353 | +[2024-09-26 22:07:42,599][Main][INFO] - [train] Step 9425 out of 20000 | Loss --> 1.867 | Grad_l2 --> 0.264 | Weights_l2 --> 11122.215 | Lr --> 0.008 | Seconds_per_step --> 6.351 | +[2024-09-26 22:10:23,365][Main][INFO] - [train] Step 9450 out of 20000 | Loss --> 1.876 | Grad_l2 --> 0.337 | Weights_l2 --> 11122.851 | Lr --> 0.008 | Seconds_per_step --> 6.431 | +[2024-09-26 22:13:02,360][Main][INFO] - [train] Step 9475 out of 20000 | Loss --> 1.878 | Grad_l2 --> 0.510 | Weights_l2 --> 11123.438 | Lr --> 0.008 | Seconds_per_step --> 6.360 | +[2024-09-26 22:15:41,189][Main][INFO] - [train] Step 9500 out of 20000 | Loss --> 1.871 | Grad_l2 --> 0.240 | Weights_l2 --> 11124.072 | Lr --> 0.008 | Seconds_per_step --> 6.353 | +[2024-09-26 22:18:20,208][Main][INFO] - [train] Step 9525 out of 20000 | Loss --> 1.874 | Grad_l2 --> 0.234 | Weights_l2 --> 11124.693 | Lr --> 0.008 | Seconds_per_step --> 6.361 | +[2024-09-26 22:21:00,719][Main][INFO] - [train] Step 9550 out of 20000 | Loss --> 1.864 | Grad_l2 --> 0.240 | Weights_l2 --> 11125.327 | Lr --> 0.008 | Seconds_per_step --> 6.420 | +[2024-09-26 22:23:39,698][Main][INFO] - [train] Step 9575 out of 20000 | Loss --> 1.865 | Grad_l2 --> 0.228 | Weights_l2 --> 11125.944 | Lr --> 0.008 | Seconds_per_step --> 6.359 | +[2024-09-26 22:26:18,748][Main][INFO] - [train] Step 9600 out of 20000 | Loss --> 1.863 | Grad_l2 --> 0.239 | Weights_l2 --> 11126.570 | Lr --> 0.008 | Seconds_per_step --> 6.362 | +[2024-09-26 22:28:57,735][Main][INFO] - [train] Step 9625 out of 20000 | Loss --> 1.866 | Grad_l2 --> 0.234 | Weights_l2 --> 11127.169 | Lr --> 0.008 | Seconds_per_step --> 6.359 | +[2024-09-26 22:31:38,405][Main][INFO] - [train] Step 9650 out of 20000 | Loss --> 1.868 | Grad_l2 --> 0.229 | Weights_l2 --> 11127.795 | Lr --> 0.008 | Seconds_per_step --> 6.427 | +[2024-09-26 22:34:17,285][Main][INFO] - [train] Step 9675 out of 20000 | Loss --> 1.875 | Grad_l2 --> 0.229 | Weights_l2 --> 11128.395 | Lr --> 0.008 | Seconds_per_step --> 6.355 | +[2024-09-26 22:36:56,077][Main][INFO] - [train] Step 9700 out of 20000 | Loss --> 1.865 | Grad_l2 --> 0.233 | Weights_l2 --> 11129.004 | Lr --> 0.008 | Seconds_per_step --> 6.352 | +[2024-09-26 22:39:34,576][Main][INFO] - [train] Step 9725 out of 20000 | Loss --> 1.869 | Grad_l2 --> 0.235 | Weights_l2 --> 11129.600 | Lr --> 0.008 | Seconds_per_step --> 6.340 | +[2024-09-26 22:42:13,383][Main][INFO] - [train] Step 9750 out of 20000 | Loss --> 1.865 | Grad_l2 --> 0.237 | Weights_l2 --> 11130.190 | Lr --> 0.008 | Seconds_per_step --> 6.352 | +[2024-09-26 22:44:53,421][Main][INFO] - [train] Step 9775 out of 20000 | Loss --> 1.858 | Grad_l2 --> 0.241 | Weights_l2 --> 11130.774 | Lr --> 0.008 | Seconds_per_step --> 6.401 | +[2024-09-26 22:47:32,530][Main][INFO] - [train] Step 9800 out of 20000 | Loss --> 1.859 | Grad_l2 --> 0.237 | Weights_l2 --> 11131.374 | Lr --> 0.008 | Seconds_per_step --> 6.364 | +[2024-09-26 22:50:11,416][Main][INFO] - [train] Step 9825 out of 20000 | Loss --> 1.859 | Grad_l2 --> 0.234 | Weights_l2 --> 11131.977 | Lr --> 0.008 | Seconds_per_step --> 6.355 | +[2024-09-26 22:52:50,376][Main][INFO] - [train] Step 9850 out of 20000 | Loss --> 1.867 | Grad_l2 --> 0.229 | Weights_l2 --> 11132.555 | Lr --> 0.008 | Seconds_per_step --> 6.358 | +[2024-09-26 22:55:30,672][Main][INFO] - [train] Step 9875 out of 20000 | Loss --> 1.868 | Grad_l2 --> 0.240 | Weights_l2 --> 11133.139 | Lr --> 0.008 | Seconds_per_step --> 6.412 | +[2024-09-26 22:58:09,404][Main][INFO] - [train] Step 9900 out of 20000 | Loss --> 1.849 | Grad_l2 --> 0.243 | Weights_l2 --> 11133.714 | Lr --> 0.008 | Seconds_per_step --> 6.349 | +[2024-09-26 23:00:48,118][Main][INFO] - [train] Step 9925 out of 20000 | Loss --> 1.857 | Grad_l2 --> 0.238 | Weights_l2 --> 11134.295 | Lr --> 0.008 | Seconds_per_step --> 6.348 | +[2024-09-26 23:03:27,011][Main][INFO] - [train] Step 9950 out of 20000 | Loss --> 1.852 | Grad_l2 --> 0.236 | Weights_l2 --> 11134.844 | Lr --> 0.008 | Seconds_per_step --> 6.356 | +[2024-09-26 23:06:07,454][Main][INFO] - [train] Step 9975 out of 20000 | Loss --> 1.852 | Grad_l2 --> 0.229 | Weights_l2 --> 11135.424 | Lr --> 0.008 | Seconds_per_step --> 6.418 | +[2024-09-26 23:08:46,325][Main][INFO] - [train] Step 10000 out of 20000 | Loss --> 1.844 | Grad_l2 --> 0.239 | Weights_l2 --> 11135.967 | Lr --> 0.008 | Seconds_per_step --> 6.355 | +[2024-09-26 23:08:46,326][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-10000 +[2024-09-26 23:08:46,334][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading +[2024-09-26 23:08:54,314][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-10000/model.safetensors +[2024-09-26 23:09:03,823][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-10000/optimizer.bin +[2024-09-26 23:09:03,824][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-10000/scheduler.bin +[2024-09-26 23:09:03,825][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-10000/sampler.bin +[2024-09-26 23:09:03,825][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-10000/sampler_1.bin +[2024-09-26 23:09:03,826][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-10000/random_states_0.pkl +[2024-09-26 23:11:42,519][Main][INFO] - [train] Step 10025 out of 20000 | Loss --> 1.858 | Grad_l2 --> 0.231 | Weights_l2 --> 11136.558 | Lr --> 0.007 | Seconds_per_step --> 7.048 | +[2024-09-26 23:14:21,586][Main][INFO] - [train] Step 10050 out of 20000 | Loss --> 1.857 | Grad_l2 --> 0.232 | Weights_l2 --> 11137.107 | Lr --> 0.007 | Seconds_per_step --> 6.363 | +[2024-09-26 23:17:01,951][Main][INFO] - [train] Step 10075 out of 20000 | Loss --> 1.857 | Grad_l2 --> 0.233 | Weights_l2 --> 11137.685 | Lr --> 0.007 | Seconds_per_step --> 6.415 | +[2024-09-26 23:19:40,920][Main][INFO] - [train] Step 10100 out of 20000 | Loss --> 1.837 | Grad_l2 --> 0.235 | Weights_l2 --> 11138.227 | Lr --> 0.007 | Seconds_per_step --> 6.359 | +[2024-09-26 23:22:19,869][Main][INFO] - [train] Step 10125 out of 20000 | Loss --> 1.839 | Grad_l2 --> 0.243 | Weights_l2 --> 11138.776 | Lr --> 0.007 | Seconds_per_step --> 6.358 | +[2024-09-26 23:24:58,803][Main][INFO] - [train] Step 10150 out of 20000 | Loss --> 1.854 | Grad_l2 --> 0.227 | Weights_l2 --> 11139.326 | Lr --> 0.007 | Seconds_per_step --> 6.357 | +[2024-09-26 23:27:37,766][Main][INFO] - [train] Step 10175 out of 20000 | Loss --> 1.840 | Grad_l2 --> 0.233 | Weights_l2 --> 11139.868 | Lr --> 0.007 | Seconds_per_step --> 6.358 | +[2024-09-26 23:30:19,128][Main][INFO] - [train] Step 10200 out of 20000 | Loss --> 1.847 | Grad_l2 --> 0.226 | Weights_l2 --> 11140.411 | Lr --> 0.007 | Seconds_per_step --> 6.454 | +[2024-09-26 23:32:58,469][Main][INFO] - [train] Step 10225 out of 20000 | Loss --> 1.852 | Grad_l2 --> 0.228 | Weights_l2 --> 11140.932 | Lr --> 0.007 | Seconds_per_step --> 6.374 | +[2024-09-26 23:35:37,517][Main][INFO] - [train] Step 10250 out of 20000 | Loss --> 1.853 | Grad_l2 --> 0.349 | Weights_l2 --> 11141.450 | Lr --> 0.007 | Seconds_per_step --> 6.362 | +[2024-09-26 23:38:16,441][Main][INFO] - [train] Step 10275 out of 20000 | Loss --> 1.838 | Grad_l2 --> 0.250 | Weights_l2 --> 11141.967 | Lr --> 0.007 | Seconds_per_step --> 6.357 | +[2024-09-26 23:40:56,772][Main][INFO] - [train] Step 10300 out of 20000 | Loss --> 1.845 | Grad_l2 --> 0.252 | Weights_l2 --> 11142.497 | Lr --> 0.007 | Seconds_per_step --> 6.413 | +[2024-09-26 23:43:35,684][Main][INFO] - [train] Step 10325 out of 20000 | Loss --> 1.851 | Grad_l2 --> 0.229 | Weights_l2 --> 11143.016 | Lr --> 0.007 | Seconds_per_step --> 6.356 | +[2024-09-26 23:46:14,557][Main][INFO] - [train] Step 10350 out of 20000 | Loss --> 1.838 | Grad_l2 --> 0.233 | Weights_l2 --> 11143.537 | Lr --> 0.007 | Seconds_per_step --> 6.355 | +[2024-09-26 23:48:53,426][Main][INFO] - [train] Step 10375 out of 20000 | Loss --> 1.845 | Grad_l2 --> 0.236 | Weights_l2 --> 11144.035 | Lr --> 0.007 | Seconds_per_step --> 6.355 | +[2024-09-26 23:51:34,187][Main][INFO] - [train] Step 10400 out of 20000 | Loss --> 1.844 | Grad_l2 --> 0.233 | Weights_l2 --> 11144.538 | Lr --> 0.007 | Seconds_per_step --> 6.430 | +[2024-09-26 23:54:13,664][Main][INFO] - [train] Step 10425 out of 20000 | Loss --> 1.851 | Grad_l2 --> 0.227 | Weights_l2 --> 11145.041 | Lr --> 0.007 | Seconds_per_step --> 6.379 | +[2024-09-26 23:56:52,677][Main][INFO] - [train] Step 10450 out of 20000 | Loss --> 1.841 | Grad_l2 --> 0.223 | Weights_l2 --> 11145.545 | Lr --> 0.007 | Seconds_per_step --> 6.360 | +[2024-09-26 23:59:31,594][Main][INFO] - [train] Step 10475 out of 20000 | Loss --> 1.838 | Grad_l2 --> 0.223 | Weights_l2 --> 11146.035 | Lr --> 0.007 | Seconds_per_step --> 6.357 | +[2024-09-27 00:02:12,050][Main][INFO] - [train] Step 10500 out of 20000 | Loss --> 1.845 | Grad_l2 --> 0.224 | Weights_l2 --> 11146.527 | Lr --> 0.007 | Seconds_per_step --> 6.418 | +[2024-09-27 00:04:51,077][Main][INFO] - [train] Step 10525 out of 20000 | Loss --> 1.842 | Grad_l2 --> 0.229 | Weights_l2 --> 11147.019 | Lr --> 0.007 | Seconds_per_step --> 6.361 | +[2024-09-27 00:07:30,151][Main][INFO] - [train] Step 10550 out of 20000 | Loss --> 1.844 | Grad_l2 --> 0.223 | Weights_l2 --> 11147.505 | Lr --> 0.007 | Seconds_per_step --> 6.363 | +[2024-09-27 00:10:09,228][Main][INFO] - [train] Step 10575 out of 20000 | Loss --> 1.839 | Grad_l2 --> 0.234 | Weights_l2 --> 11147.990 | Lr --> 0.007 | Seconds_per_step --> 6.363 | +[2024-09-27 00:12:48,255][Main][INFO] - [train] Step 10600 out of 20000 | Loss --> 1.858 | Grad_l2 --> 0.609 | Weights_l2 --> 11148.447 | Lr --> 0.007 | Seconds_per_step --> 6.361 | +[2024-09-27 00:15:28,876][Main][INFO] - [train] Step 10625 out of 20000 | Loss --> 1.838 | Grad_l2 --> 0.240 | Weights_l2 --> 11148.907 | Lr --> 0.007 | Seconds_per_step --> 6.425 | +[2024-09-27 00:18:07,925][Main][INFO] - [train] Step 10650 out of 20000 | Loss --> 1.848 | Grad_l2 --> 0.226 | Weights_l2 --> 11149.386 | Lr --> 0.007 | Seconds_per_step --> 6.362 | +[2024-09-27 00:20:47,172][Main][INFO] - [train] Step 10675 out of 20000 | Loss --> 1.847 | Grad_l2 --> 0.226 | Weights_l2 --> 11149.846 | Lr --> 0.007 | Seconds_per_step --> 6.370 | +[2024-09-27 00:23:26,622][Main][INFO] - [train] Step 10700 out of 20000 | Loss --> 1.838 | Grad_l2 --> 0.223 | Weights_l2 --> 11150.311 | Lr --> 0.007 | Seconds_per_step --> 6.378 | +[2024-09-27 00:26:07,466][Main][INFO] - [train] Step 10725 out of 20000 | Loss --> 1.839 | Grad_l2 --> 0.225 | Weights_l2 --> 11150.759 | Lr --> 0.007 | Seconds_per_step --> 6.434 | +[2024-09-27 00:28:46,560][Main][INFO] - [train] Step 10750 out of 20000 | Loss --> 1.837 | Grad_l2 --> 0.230 | Weights_l2 --> 11151.209 | Lr --> 0.007 | Seconds_per_step --> 6.364 | +[2024-09-27 00:31:25,571][Main][INFO] - [train] Step 10775 out of 20000 | Loss --> 1.855 | Grad_l2 --> 0.225 | Weights_l2 --> 11151.665 | Lr --> 0.007 | Seconds_per_step --> 6.360 | +[2024-09-27 00:34:04,750][Main][INFO] - [train] Step 10800 out of 20000 | Loss --> 1.838 | Grad_l2 --> 0.224 | Weights_l2 --> 11152.110 | Lr --> 0.007 | Seconds_per_step --> 6.367 | +[2024-09-27 00:36:45,119][Main][INFO] - [train] Step 10825 out of 20000 | Loss --> 1.847 | Grad_l2 --> 0.223 | Weights_l2 --> 11152.555 | Lr --> 0.007 | Seconds_per_step --> 6.415 | +[2024-09-27 00:39:24,544][Main][INFO] - [train] Step 10850 out of 20000 | Loss --> 1.840 | Grad_l2 --> 0.226 | Weights_l2 --> 11153.015 | Lr --> 0.007 | Seconds_per_step --> 6.377 | +[2024-09-27 00:42:03,771][Main][INFO] - [train] Step 10875 out of 20000 | Loss --> 1.846 | Grad_l2 --> 0.234 | Weights_l2 --> 11153.455 | Lr --> 0.007 | Seconds_per_step --> 6.369 | +[2024-09-27 00:44:42,873][Main][INFO] - [train] Step 10900 out of 20000 | Loss --> 1.840 | Grad_l2 --> 0.230 | Weights_l2 --> 11153.893 | Lr --> 0.007 | Seconds_per_step --> 6.364 | +[2024-09-27 00:47:23,440][Main][INFO] - [train] Step 10925 out of 20000 | Loss --> 1.832 | Grad_l2 --> 0.224 | Weights_l2 --> 11154.332 | Lr --> 0.007 | Seconds_per_step --> 6.423 | +[2024-09-27 00:50:02,925][Main][INFO] - [train] Step 10950 out of 20000 | Loss --> 1.848 | Grad_l2 --> 0.234 | Weights_l2 --> 11154.780 | Lr --> 0.007 | Seconds_per_step --> 6.379 | +[2024-09-27 00:52:41,993][Main][INFO] - [train] Step 10975 out of 20000 | Loss --> 1.839 | Grad_l2 --> 0.238 | Weights_l2 --> 11155.208 | Lr --> 0.007 | Seconds_per_step --> 6.363 | +[2024-09-27 00:55:21,162][Main][INFO] - [train] Step 11000 out of 20000 | Loss --> 1.832 | Grad_l2 --> 0.228 | Weights_l2 --> 11155.638 | Lr --> 0.007 | Seconds_per_step --> 6.367 | +[2024-09-27 00:58:00,391][Main][INFO] - [train] Step 11025 out of 20000 | Loss --> 1.848 | Grad_l2 --> 0.231 | Weights_l2 --> 11156.067 | Lr --> 0.007 | Seconds_per_step --> 6.369 | +[2024-09-27 01:00:41,863][Main][INFO] - [train] Step 11050 out of 20000 | Loss --> 1.847 | Grad_l2 --> 0.233 | Weights_l2 --> 11156.474 | Lr --> 0.007 | Seconds_per_step --> 6.459 | +[2024-09-27 01:03:21,348][Main][INFO] - [train] Step 11075 out of 20000 | Loss --> 1.833 | Grad_l2 --> 0.242 | Weights_l2 --> 11156.899 | Lr --> 0.006 | Seconds_per_step --> 6.379 | +[2024-09-27 01:06:00,864][Main][INFO] - [train] Step 11100 out of 20000 | Loss --> 1.837 | Grad_l2 --> 0.248 | Weights_l2 --> 11157.319 | Lr --> 0.006 | Seconds_per_step --> 6.381 | +[2024-09-27 01:08:39,965][Main][INFO] - [train] Step 11125 out of 20000 | Loss --> 1.846 | Grad_l2 --> 0.228 | Weights_l2 --> 11157.744 | Lr --> 0.006 | Seconds_per_step --> 6.364 | +[2024-09-27 01:11:20,495][Main][INFO] - [train] Step 11150 out of 20000 | Loss --> 1.831 | Grad_l2 --> 0.224 | Weights_l2 --> 11158.149 | Lr --> 0.006 | Seconds_per_step --> 6.421 | +[2024-09-27 01:13:59,595][Main][INFO] - [train] Step 11175 out of 20000 | Loss --> 1.840 | Grad_l2 --> 0.226 | Weights_l2 --> 11158.552 | Lr --> 0.006 | Seconds_per_step --> 6.364 | +[2024-09-27 01:16:38,757][Main][INFO] - [train] Step 11200 out of 20000 | Loss --> 1.839 | Grad_l2 --> 0.239 | Weights_l2 --> 11158.959 | Lr --> 0.006 | Seconds_per_step --> 6.366 | +[2024-09-27 01:19:17,818][Main][INFO] - [train] Step 11225 out of 20000 | Loss --> 1.850 | Grad_l2 --> 0.224 | Weights_l2 --> 11159.353 | Lr --> 0.006 | Seconds_per_step --> 6.362 | +[2024-09-27 01:21:59,340][Main][INFO] - [train] Step 11250 out of 20000 | Loss --> 1.847 | Grad_l2 --> 0.224 | Weights_l2 --> 11159.750 | Lr --> 0.006 | Seconds_per_step --> 6.461 | +[2024-09-27 01:24:38,363][Main][INFO] - [train] Step 11275 out of 20000 | Loss --> 1.841 | Grad_l2 --> 0.221 | Weights_l2 --> 11160.137 | Lr --> 0.006 | Seconds_per_step --> 6.361 | +[2024-09-27 01:27:17,452][Main][INFO] - [train] Step 11300 out of 20000 | Loss --> 1.828 | Grad_l2 --> 0.221 | Weights_l2 --> 11160.521 | Lr --> 0.006 | Seconds_per_step --> 6.363 | +[2024-09-27 01:29:56,568][Main][INFO] - [train] Step 11325 out of 20000 | Loss --> 1.842 | Grad_l2 --> 0.230 | Weights_l2 --> 11160.919 | Lr --> 0.006 | Seconds_per_step --> 6.365 | +[2024-09-27 01:32:36,810][Main][INFO] - [train] Step 11350 out of 20000 | Loss --> 1.838 | Grad_l2 --> 0.225 | Weights_l2 --> 11161.293 | Lr --> 0.006 | Seconds_per_step --> 6.410 | +[2024-09-27 01:35:15,731][Main][INFO] - [train] Step 11375 out of 20000 | Loss --> 1.845 | Grad_l2 --> 0.225 | Weights_l2 --> 11161.660 | Lr --> 0.006 | Seconds_per_step --> 6.357 | +[2024-09-27 01:37:54,717][Main][INFO] - [train] Step 11400 out of 20000 | Loss --> 1.828 | Grad_l2 --> 0.229 | Weights_l2 --> 11162.028 | Lr --> 0.006 | Seconds_per_step --> 6.359 | +[2024-09-27 01:40:33,628][Main][INFO] - [train] Step 11425 out of 20000 | Loss --> 1.837 | Grad_l2 --> 0.234 | Weights_l2 --> 11162.390 | Lr --> 0.006 | Seconds_per_step --> 6.356 | +[2024-09-27 01:43:14,168][Main][INFO] - [train] Step 11450 out of 20000 | Loss --> 1.814 | Grad_l2 --> 0.230 | Weights_l2 --> 11162.764 | Lr --> 0.006 | Seconds_per_step --> 6.422 | +[2024-09-27 01:45:53,230][Main][INFO] - [train] Step 11475 out of 20000 | Loss --> 1.838 | Grad_l2 --> 0.229 | Weights_l2 --> 11163.128 | Lr --> 0.006 | Seconds_per_step --> 6.362 | +[2024-09-27 01:48:32,269][Main][INFO] - [train] Step 11500 out of 20000 | Loss --> 1.830 | Grad_l2 --> 0.234 | Weights_l2 --> 11163.492 | Lr --> 0.006 | Seconds_per_step --> 6.362 | +[2024-09-27 01:51:10,939][Main][INFO] - [train] Step 11525 out of 20000 | Loss --> 1.832 | Grad_l2 --> 0.229 | Weights_l2 --> 11163.855 | Lr --> 0.006 | Seconds_per_step --> 6.347 | +[2024-09-27 01:53:49,994][Main][INFO] - [train] Step 11550 out of 20000 | Loss --> 1.837 | Grad_l2 --> 0.342 | Weights_l2 --> 11164.218 | Lr --> 0.006 | Seconds_per_step --> 6.362 | +[2024-09-27 01:56:31,441][Main][INFO] - [train] Step 11575 out of 20000 | Loss --> 1.834 | Grad_l2 --> 0.603 | Weights_l2 --> 11164.552 | Lr --> 0.006 | Seconds_per_step --> 6.458 | +[2024-09-27 01:59:10,410][Main][INFO] - [train] Step 11600 out of 20000 | Loss --> 1.848 | Grad_l2 --> 0.228 | Weights_l2 --> 11164.904 | Lr --> 0.006 | Seconds_per_step --> 6.359 | +[2024-09-27 02:01:49,503][Main][INFO] - [train] Step 11625 out of 20000 | Loss --> 1.835 | Grad_l2 --> 0.227 | Weights_l2 --> 11165.254 | Lr --> 0.006 | Seconds_per_step --> 6.364 | +[2024-09-27 02:04:28,209][Main][INFO] - [train] Step 11650 out of 20000 | Loss --> 1.822 | Grad_l2 --> 0.224 | Weights_l2 --> 11165.580 | Lr --> 0.006 | Seconds_per_step --> 6.348 | +[2024-09-27 02:07:08,514][Main][INFO] - [train] Step 11675 out of 20000 | Loss --> 1.847 | Grad_l2 --> 0.226 | Weights_l2 --> 11165.925 | Lr --> 0.006 | Seconds_per_step --> 6.412 | +[2024-09-27 02:09:47,402][Main][INFO] - [train] Step 11700 out of 20000 | Loss --> 1.822 | Grad_l2 --> 0.216 | Weights_l2 --> 11166.264 | Lr --> 0.006 | Seconds_per_step --> 6.355 | +[2024-09-27 02:12:26,185][Main][INFO] - [train] Step 11725 out of 20000 | Loss --> 1.820 | Grad_l2 --> 0.221 | Weights_l2 --> 11166.596 | Lr --> 0.006 | Seconds_per_step --> 6.351 | +[2024-09-27 02:15:05,065][Main][INFO] - [train] Step 11750 out of 20000 | Loss --> 1.820 | Grad_l2 --> 0.226 | Weights_l2 --> 11166.928 | Lr --> 0.006 | Seconds_per_step --> 6.355 | +[2024-09-27 02:17:45,528][Main][INFO] - [train] Step 11775 out of 20000 | Loss --> 1.821 | Grad_l2 --> 0.231 | Weights_l2 --> 11167.254 | Lr --> 0.006 | Seconds_per_step --> 6.418 | +[2024-09-27 02:20:24,411][Main][INFO] - [train] Step 11800 out of 20000 | Loss --> 1.814 | Grad_l2 --> 0.227 | Weights_l2 --> 11167.572 | Lr --> 0.006 | Seconds_per_step --> 6.355 | +[2024-09-27 02:23:03,445][Main][INFO] - [train] Step 11825 out of 20000 | Loss --> 1.833 | Grad_l2 --> 0.227 | Weights_l2 --> 11167.893 | Lr --> 0.006 | Seconds_per_step --> 6.361 | +[2024-09-27 02:25:42,386][Main][INFO] - [train] Step 11850 out of 20000 | Loss --> 1.831 | Grad_l2 --> 0.225 | Weights_l2 --> 11168.228 | Lr --> 0.006 | Seconds_per_step --> 6.358 | +[2024-09-27 02:28:22,908][Main][INFO] - [train] Step 11875 out of 20000 | Loss --> 1.832 | Grad_l2 --> 0.223 | Weights_l2 --> 11168.554 | Lr --> 0.006 | Seconds_per_step --> 6.421 | +[2024-09-27 02:31:01,804][Main][INFO] - [train] Step 11900 out of 20000 | Loss --> 1.824 | Grad_l2 --> 0.222 | Weights_l2 --> 11168.868 | Lr --> 0.006 | Seconds_per_step --> 6.356 | +[2024-09-27 02:33:40,594][Main][INFO] - [train] Step 11925 out of 20000 | Loss --> 1.823 | Grad_l2 --> 0.228 | Weights_l2 --> 11169.174 | Lr --> 0.006 | Seconds_per_step --> 6.352 | +[2024-09-27 02:36:19,335][Main][INFO] - [train] Step 11950 out of 20000 | Loss --> 1.815 | Grad_l2 --> 0.221 | Weights_l2 --> 11169.474 | Lr --> 0.006 | Seconds_per_step --> 6.350 | +[2024-09-27 02:38:58,467][Main][INFO] - [train] Step 11975 out of 20000 | Loss --> 1.825 | Grad_l2 --> 0.222 | Weights_l2 --> 11169.783 | Lr --> 0.006 | Seconds_per_step --> 6.365 | +[2024-09-27 02:41:39,782][Main][INFO] - [train] Step 12000 out of 20000 | Loss --> 1.808 | Grad_l2 --> 0.221 | Weights_l2 --> 11170.097 | Lr --> 0.006 | Seconds_per_step --> 6.453 | +[2024-09-27 02:44:19,122][Main][INFO] - [train] Step 12025 out of 20000 | Loss --> 1.825 | Grad_l2 --> 0.225 | Weights_l2 --> 11170.397 | Lr --> 0.006 | Seconds_per_step --> 6.374 | +[2024-09-27 02:46:58,543][Main][INFO] - [train] Step 12050 out of 20000 | Loss --> 1.821 | Grad_l2 --> 0.223 | Weights_l2 --> 11170.695 | Lr --> 0.005 | Seconds_per_step --> 6.377 | +[2024-09-27 02:49:37,836][Main][INFO] - [train] Step 12075 out of 20000 | Loss --> 1.821 | Grad_l2 --> 0.230 | Weights_l2 --> 11170.999 | Lr --> 0.005 | Seconds_per_step --> 6.372 | +[2024-09-27 02:52:18,582][Main][INFO] - [train] Step 12100 out of 20000 | Loss --> 1.833 | Grad_l2 --> 0.220 | Weights_l2 --> 11171.288 | Lr --> 0.005 | Seconds_per_step --> 6.430 | +[2024-09-27 02:54:57,704][Main][INFO] - [train] Step 12125 out of 20000 | Loss --> 1.829 | Grad_l2 --> 0.222 | Weights_l2 --> 11171.562 | Lr --> 0.005 | Seconds_per_step --> 6.365 | +[2024-09-27 02:57:36,707][Main][INFO] - [train] Step 12150 out of 20000 | Loss --> 1.810 | Grad_l2 --> 0.234 | Weights_l2 --> 11171.844 | Lr --> 0.005 | Seconds_per_step --> 6.360 | +[2024-09-27 03:00:15,923][Main][INFO] - [train] Step 12175 out of 20000 | Loss --> 1.814 | Grad_l2 --> 0.216 | Weights_l2 --> 11172.126 | Lr --> 0.005 | Seconds_per_step --> 6.369 | +[2024-09-27 03:02:56,747][Main][INFO] - [train] Step 12200 out of 20000 | Loss --> 1.822 | Grad_l2 --> 0.220 | Weights_l2 --> 11172.393 | Lr --> 0.005 | Seconds_per_step --> 6.433 | +[2024-09-27 03:05:36,023][Main][INFO] - [train] Step 12225 out of 20000 | Loss --> 1.823 | Grad_l2 --> 0.227 | Weights_l2 --> 11172.670 | Lr --> 0.005 | Seconds_per_step --> 6.371 | +[2024-09-27 03:08:15,048][Main][INFO] - [train] Step 12250 out of 20000 | Loss --> 1.819 | Grad_l2 --> 0.224 | Weights_l2 --> 11172.948 | Lr --> 0.005 | Seconds_per_step --> 6.361 | +[2024-09-27 03:10:53,822][Main][INFO] - [train] Step 12275 out of 20000 | Loss --> 1.827 | Grad_l2 --> 0.220 | Weights_l2 --> 11173.212 | Lr --> 0.005 | Seconds_per_step --> 6.351 | +[2024-09-27 03:13:34,755][Main][INFO] - [train] Step 12300 out of 20000 | Loss --> 1.805 | Grad_l2 --> 0.221 | Weights_l2 --> 11173.483 | Lr --> 0.005 | Seconds_per_step --> 6.437 | +[2024-09-27 03:16:13,857][Main][INFO] - [train] Step 12325 out of 20000 | Loss --> 1.821 | Grad_l2 --> 0.225 | Weights_l2 --> 11173.743 | Lr --> 0.005 | Seconds_per_step --> 6.364 | +[2024-09-27 03:18:52,886][Main][INFO] - [train] Step 12350 out of 20000 | Loss --> 1.828 | Grad_l2 --> 0.226 | Weights_l2 --> 11174.017 | Lr --> 0.005 | Seconds_per_step --> 6.361 | +[2024-09-27 03:21:31,991][Main][INFO] - [train] Step 12375 out of 20000 | Loss --> 1.812 | Grad_l2 --> 0.225 | Weights_l2 --> 11174.273 | Lr --> 0.005 | Seconds_per_step --> 6.364 | +[2024-09-27 03:24:12,648][Main][INFO] - [train] Step 12400 out of 20000 | Loss --> 1.814 | Grad_l2 --> 0.222 | Weights_l2 --> 11174.531 | Lr --> 0.005 | Seconds_per_step --> 6.426 | +[2024-09-27 03:26:51,429][Main][INFO] - [train] Step 12425 out of 20000 | Loss --> 1.818 | Grad_l2 --> 0.222 | Weights_l2 --> 11174.781 | Lr --> 0.005 | Seconds_per_step --> 6.351 | +[2024-09-27 03:29:30,463][Main][INFO] - [train] Step 12450 out of 20000 | Loss --> 1.815 | Grad_l2 --> 0.219 | Weights_l2 --> 11175.034 | Lr --> 0.005 | Seconds_per_step --> 6.361 | +[2024-09-27 03:32:09,537][Main][INFO] - [train] Step 12475 out of 20000 | Loss --> 1.802 | Grad_l2 --> 0.219 | Weights_l2 --> 11175.276 | Lr --> 0.005 | Seconds_per_step --> 6.363 | +[2024-09-27 03:34:48,566][Main][INFO] - [train] Step 12500 out of 20000 | Loss --> 1.821 | Grad_l2 --> 0.218 | Weights_l2 --> 11175.522 | Lr --> 0.005 | Seconds_per_step --> 6.361 | +[2024-09-27 03:34:48,567][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-12500 +[2024-09-27 03:34:48,574][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading +[2024-09-27 03:34:56,825][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-12500/model.safetensors +[2024-09-27 03:35:06,329][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-12500/optimizer.bin +[2024-09-27 03:35:06,330][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-12500/scheduler.bin +[2024-09-27 03:35:06,331][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-12500/sampler.bin +[2024-09-27 03:35:06,331][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-12500/sampler_1.bin +[2024-09-27 03:35:06,333][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-12500/random_states_0.pkl +[2024-09-27 03:37:46,508][Main][INFO] - [train] Step 12525 out of 20000 | Loss --> 1.815 | Grad_l2 --> 0.222 | Weights_l2 --> 11175.752 | Lr --> 0.005 | Seconds_per_step --> 7.118 | +[2024-09-27 03:40:25,646][Main][INFO] - [train] Step 12550 out of 20000 | Loss --> 1.817 | Grad_l2 --> 0.222 | Weights_l2 --> 11175.998 | Lr --> 0.005 | Seconds_per_step --> 6.365 | +[2024-09-27 03:43:04,612][Main][INFO] - [train] Step 12575 out of 20000 | Loss --> 1.803 | Grad_l2 --> 0.224 | Weights_l2 --> 11176.235 | Lr --> 0.005 | Seconds_per_step --> 6.359 | +[2024-09-27 03:45:43,579][Main][INFO] - [train] Step 12600 out of 20000 | Loss --> 1.806 | Grad_l2 --> 0.215 | Weights_l2 --> 11176.463 | Lr --> 0.005 | Seconds_per_step --> 6.359 | +[2024-09-27 03:48:23,937][Main][INFO] - [train] Step 12625 out of 20000 | Loss --> 1.807 | Grad_l2 --> 0.231 | Weights_l2 --> 11176.693 | Lr --> 0.005 | Seconds_per_step --> 6.414 | +[2024-09-27 03:51:02,860][Main][INFO] - [train] Step 12650 out of 20000 | Loss --> 1.801 | Grad_l2 --> 0.220 | Weights_l2 --> 11176.923 | Lr --> 0.005 | Seconds_per_step --> 6.357 | +[2024-09-27 03:53:41,776][Main][INFO] - [train] Step 12675 out of 20000 | Loss --> 1.801 | Grad_l2 --> 0.221 | Weights_l2 --> 11177.151 | Lr --> 0.005 | Seconds_per_step --> 6.357 | +[2024-09-27 03:56:20,638][Main][INFO] - [train] Step 12700 out of 20000 | Loss --> 1.812 | Grad_l2 --> 0.225 | Weights_l2 --> 11177.374 | Lr --> 0.005 | Seconds_per_step --> 6.354 | +[2024-09-27 03:59:01,157][Main][INFO] - [train] Step 12725 out of 20000 | Loss --> 1.807 | Grad_l2 --> 0.219 | Weights_l2 --> 11177.587 | Lr --> 0.005 | Seconds_per_step --> 6.421 | +[2024-09-27 04:01:40,504][Main][INFO] - [train] Step 12750 out of 20000 | Loss --> 1.806 | Grad_l2 --> 0.220 | Weights_l2 --> 11177.816 | Lr --> 0.005 | Seconds_per_step --> 6.374 | +[2024-09-27 04:04:20,044][Main][INFO] - [train] Step 12775 out of 20000 | Loss --> 1.803 | Grad_l2 --> 0.221 | Weights_l2 --> 11178.024 | Lr --> 0.005 | Seconds_per_step --> 6.382 | +[2024-09-27 04:06:59,144][Main][INFO] - [train] Step 12800 out of 20000 | Loss --> 1.802 | Grad_l2 --> 0.232 | Weights_l2 --> 11178.231 | Lr --> 0.005 | Seconds_per_step --> 6.364 | +[2024-09-27 04:09:39,777][Main][INFO] - [train] Step 12825 out of 20000 | Loss --> 1.795 | Grad_l2 --> 0.223 | Weights_l2 --> 11178.439 | Lr --> 0.005 | Seconds_per_step --> 6.425 | +[2024-09-27 04:12:18,707][Main][INFO] - [train] Step 12850 out of 20000 | Loss --> 1.800 | Grad_l2 --> 0.219 | Weights_l2 --> 11178.646 | Lr --> 0.005 | Seconds_per_step --> 6.357 | +[2024-09-27 04:14:57,641][Main][INFO] - [train] Step 12875 out of 20000 | Loss --> 1.800 | Grad_l2 --> 0.219 | Weights_l2 --> 11178.840 | Lr --> 0.005 | Seconds_per_step --> 6.357 | +[2024-09-27 04:17:36,429][Main][INFO] - [train] Step 12900 out of 20000 | Loss --> 1.800 | Grad_l2 --> 0.227 | Weights_l2 --> 11179.050 | Lr --> 0.005 | Seconds_per_step --> 6.351 | +[2024-09-27 04:20:15,506][Main][INFO] - [train] Step 12925 out of 20000 | Loss --> 1.801 | Grad_l2 --> 0.221 | Weights_l2 --> 11179.257 | Lr --> 0.005 | Seconds_per_step --> 6.363 | +[2024-09-27 04:22:55,914][Main][INFO] - [train] Step 12950 out of 20000 | Loss --> 1.800 | Grad_l2 --> 0.218 | Weights_l2 --> 11179.454 | Lr --> 0.005 | Seconds_per_step --> 6.416 | +[2024-09-27 04:25:34,868][Main][INFO] - [train] Step 12975 out of 20000 | Loss --> 1.799 | Grad_l2 --> 0.222 | Weights_l2 --> 11179.648 | Lr --> 0.005 | Seconds_per_step --> 6.358 | +[2024-09-27 04:28:13,680][Main][INFO] - [train] Step 13000 out of 20000 | Loss --> 1.793 | Grad_l2 --> 0.222 | Weights_l2 --> 11179.854 | Lr --> 0.004 | Seconds_per_step --> 6.352 | +[2024-09-27 04:30:52,641][Main][INFO] - [train] Step 13025 out of 20000 | Loss --> 1.793 | Grad_l2 --> 0.219 | Weights_l2 --> 11180.041 | Lr --> 0.004 | Seconds_per_step --> 6.358 | +[2024-09-27 04:33:33,204][Main][INFO] - [train] Step 13050 out of 20000 | Loss --> 1.791 | Grad_l2 --> 0.218 | Weights_l2 --> 11180.229 | Lr --> 0.004 | Seconds_per_step --> 6.422 | +[2024-09-27 04:36:12,215][Main][INFO] - [train] Step 13075 out of 20000 | Loss --> 1.776 | Grad_l2 --> 0.220 | Weights_l2 --> 11180.406 | Lr --> 0.004 | Seconds_per_step --> 6.360 | +[2024-09-27 04:38:51,170][Main][INFO] - [train] Step 13100 out of 20000 | Loss --> 1.794 | Grad_l2 --> 0.223 | Weights_l2 --> 11180.591 | Lr --> 0.004 | Seconds_per_step --> 6.358 | +[2024-09-27 04:41:30,475][Main][INFO] - [train] Step 13125 out of 20000 | Loss --> 1.800 | Grad_l2 --> 0.222 | Weights_l2 --> 11180.771 | Lr --> 0.004 | Seconds_per_step --> 6.372 | +[2024-09-27 04:44:11,005][Main][INFO] - [train] Step 13150 out of 20000 | Loss --> 1.792 | Grad_l2 --> 0.221 | Weights_l2 --> 11180.952 | Lr --> 0.004 | Seconds_per_step --> 6.421 | +[2024-09-27 04:46:49,827][Main][INFO] - [train] Step 13175 out of 20000 | Loss --> 1.793 | Grad_l2 --> 0.217 | Weights_l2 --> 11181.127 | Lr --> 0.004 | Seconds_per_step --> 6.353 | +[2024-09-27 04:49:28,769][Main][INFO] - [train] Step 13200 out of 20000 | Loss --> 1.800 | Grad_l2 --> 0.219 | Weights_l2 --> 11181.299 | Lr --> 0.004 | Seconds_per_step --> 6.358 | +[2024-09-27 04:52:07,696][Main][INFO] - [train] Step 13225 out of 20000 | Loss --> 1.797 | Grad_l2 --> 0.226 | Weights_l2 --> 11181.484 | Lr --> 0.004 | Seconds_per_step --> 6.357 | +[2024-09-27 04:54:48,433][Main][INFO] - [train] Step 13250 out of 20000 | Loss --> 1.790 | Grad_l2 --> 0.223 | Weights_l2 --> 11181.653 | Lr --> 0.004 | Seconds_per_step --> 6.429 | +[2024-09-27 04:57:27,407][Main][INFO] - [train] Step 13275 out of 20000 | Loss --> 1.786 | Grad_l2 --> 0.224 | Weights_l2 --> 11181.824 | Lr --> 0.004 | Seconds_per_step --> 6.359 | +[2024-09-27 05:00:06,377][Main][INFO] - [train] Step 13300 out of 20000 | Loss --> 1.786 | Grad_l2 --> 0.226 | Weights_l2 --> 11181.987 | Lr --> 0.004 | Seconds_per_step --> 6.359 | +[2024-09-27 05:02:45,273][Main][INFO] - [train] Step 13325 out of 20000 | Loss --> 1.791 | Grad_l2 --> 0.221 | Weights_l2 --> 11182.171 | Lr --> 0.004 | Seconds_per_step --> 6.356 | +[2024-09-27 05:05:24,090][Main][INFO] - [train] Step 13350 out of 20000 | Loss --> 1.792 | Grad_l2 --> 0.390 | Weights_l2 --> 11182.317 | Lr --> 0.004 | Seconds_per_step --> 6.353 | +[2024-09-27 05:08:04,664][Main][INFO] - [train] Step 13375 out of 20000 | Loss --> 1.795 | Grad_l2 --> 0.224 | Weights_l2 --> 11182.484 | Lr --> 0.004 | Seconds_per_step --> 6.423 | +[2024-09-27 05:10:43,515][Main][INFO] - [train] Step 13400 out of 20000 | Loss --> 1.778 | Grad_l2 --> 0.218 | Weights_l2 --> 11182.634 | Lr --> 0.004 | Seconds_per_step --> 6.354 | +[2024-09-27 05:13:22,566][Main][INFO] - [train] Step 13425 out of 20000 | Loss --> 1.802 | Grad_l2 --> 0.219 | Weights_l2 --> 11182.783 | Lr --> 0.004 | Seconds_per_step --> 6.362 | +[2024-09-27 05:16:01,404][Main][INFO] - [train] Step 13450 out of 20000 | Loss --> 1.785 | Grad_l2 --> 0.218 | Weights_l2 --> 11182.929 | Lr --> 0.004 | Seconds_per_step --> 6.353 | +[2024-09-27 05:18:41,620][Main][INFO] - [train] Step 13475 out of 20000 | Loss --> 1.783 | Grad_l2 --> 0.220 | Weights_l2 --> 11183.094 | Lr --> 0.004 | Seconds_per_step --> 6.409 | +[2024-09-27 05:21:20,628][Main][INFO] - [train] Step 13500 out of 20000 | Loss --> 1.782 | Grad_l2 --> 0.224 | Weights_l2 --> 11183.237 | Lr --> 0.004 | Seconds_per_step --> 6.360 | +[2024-09-27 05:23:59,810][Main][INFO] - [train] Step 13525 out of 20000 | Loss --> 1.791 | Grad_l2 --> 0.216 | Weights_l2 --> 11183.389 | Lr --> 0.004 | Seconds_per_step --> 6.367 | +[2024-09-27 05:26:38,977][Main][INFO] - [train] Step 13550 out of 20000 | Loss --> 1.790 | Grad_l2 --> 0.217 | Weights_l2 --> 11183.527 | Lr --> 0.004 | Seconds_per_step --> 6.367 | +[2024-09-27 05:29:20,276][Main][INFO] - [train] Step 13575 out of 20000 | Loss --> 1.781 | Grad_l2 --> 0.219 | Weights_l2 --> 11183.666 | Lr --> 0.004 | Seconds_per_step --> 6.452 | +[2024-09-27 05:31:59,583][Main][INFO] - [train] Step 13600 out of 20000 | Loss --> 1.780 | Grad_l2 --> 0.229 | Weights_l2 --> 11183.804 | Lr --> 0.004 | Seconds_per_step --> 6.372 | +[2024-09-27 05:34:38,637][Main][INFO] - [train] Step 13625 out of 20000 | Loss --> 1.785 | Grad_l2 --> 0.220 | Weights_l2 --> 11183.943 | Lr --> 0.004 | Seconds_per_step --> 6.362 | +[2024-09-27 05:37:17,634][Main][INFO] - [train] Step 13650 out of 20000 | Loss --> 1.778 | Grad_l2 --> 0.216 | Weights_l2 --> 11184.090 | Lr --> 0.004 | Seconds_per_step --> 6.360 | +[2024-09-27 05:39:58,435][Main][INFO] - [train] Step 13675 out of 20000 | Loss --> 1.779 | Grad_l2 --> 0.214 | Weights_l2 --> 11184.219 | Lr --> 0.004 | Seconds_per_step --> 6.432 | +[2024-09-27 05:42:37,378][Main][INFO] - [train] Step 13700 out of 20000 | Loss --> 1.788 | Grad_l2 --> 0.216 | Weights_l2 --> 11184.345 | Lr --> 0.004 | Seconds_per_step --> 6.358 | +[2024-09-27 05:45:16,150][Main][INFO] - [train] Step 13725 out of 20000 | Loss --> 1.785 | Grad_l2 --> 0.223 | Weights_l2 --> 11184.472 | Lr --> 0.004 | Seconds_per_step --> 6.351 | +[2024-09-27 05:47:54,821][Main][INFO] - [train] Step 13750 out of 20000 | Loss --> 1.777 | Grad_l2 --> 0.224 | Weights_l2 --> 11184.604 | Lr --> 0.004 | Seconds_per_step --> 6.347 | +[2024-09-27 05:50:33,838][Main][INFO] - [train] Step 13775 out of 20000 | Loss --> 1.785 | Grad_l2 --> 0.227 | Weights_l2 --> 11184.732 | Lr --> 0.004 | Seconds_per_step --> 6.361 | +[2024-09-27 05:53:14,342][Main][INFO] - [train] Step 13800 out of 20000 | Loss --> 1.774 | Grad_l2 --> 0.216 | Weights_l2 --> 11184.847 | Lr --> 0.004 | Seconds_per_step --> 6.420 | +[2024-09-27 05:55:53,267][Main][INFO] - [train] Step 13825 out of 20000 | Loss --> 1.772 | Grad_l2 --> 0.220 | Weights_l2 --> 11184.979 | Lr --> 0.004 | Seconds_per_step --> 6.357 | +[2024-09-27 05:58:32,309][Main][INFO] - [train] Step 13850 out of 20000 | Loss --> 1.765 | Grad_l2 --> 0.220 | Weights_l2 --> 11185.098 | Lr --> 0.004 | Seconds_per_step --> 6.362 | +[2024-09-27 06:01:11,190][Main][INFO] - [train] Step 13875 out of 20000 | Loss --> 1.762 | Grad_l2 --> 0.224 | Weights_l2 --> 11185.224 | Lr --> 0.004 | Seconds_per_step --> 6.355 | +[2024-09-27 06:03:52,005][Main][INFO] - [train] Step 13900 out of 20000 | Loss --> 1.777 | Grad_l2 --> 0.216 | Weights_l2 --> 11185.347 | Lr --> 0.004 | Seconds_per_step --> 6.433 | +[2024-09-27 06:06:31,261][Main][INFO] - [train] Step 13925 out of 20000 | Loss --> 1.771 | Grad_l2 --> 0.222 | Weights_l2 --> 11185.458 | Lr --> 0.004 | Seconds_per_step --> 6.370 | +[2024-09-27 06:09:10,227][Main][INFO] - [train] Step 13950 out of 20000 | Loss --> 1.773 | Grad_l2 --> 0.221 | Weights_l2 --> 11185.577 | Lr --> 0.004 | Seconds_per_step --> 6.359 | +[2024-09-27 06:11:49,369][Main][INFO] - [train] Step 13975 out of 20000 | Loss --> 1.767 | Grad_l2 --> 0.216 | Weights_l2 --> 11185.690 | Lr --> 0.003 | Seconds_per_step --> 6.366 | +[2024-09-27 06:14:30,065][Main][INFO] - [train] Step 14000 out of 20000 | Loss --> 1.790 | Grad_l2 --> 0.228 | Weights_l2 --> 11185.791 | Lr --> 0.003 | Seconds_per_step --> 6.428 | +[2024-09-27 06:17:09,215][Main][INFO] - [train] Step 14025 out of 20000 | Loss --> 1.766 | Grad_l2 --> 0.218 | Weights_l2 --> 11185.903 | Lr --> 0.003 | Seconds_per_step --> 6.366 | +[2024-09-27 06:19:48,215][Main][INFO] - [train] Step 14050 out of 20000 | Loss --> 1.763 | Grad_l2 --> 0.217 | Weights_l2 --> 11186.010 | Lr --> 0.003 | Seconds_per_step --> 6.360 | +[2024-09-27 06:22:27,210][Main][INFO] - [train] Step 14075 out of 20000 | Loss --> 1.761 | Grad_l2 --> 0.218 | Weights_l2 --> 11186.113 | Lr --> 0.003 | Seconds_per_step --> 6.360 | +[2024-09-27 06:25:07,819][Main][INFO] - [train] Step 14100 out of 20000 | Loss --> 1.775 | Grad_l2 --> 0.219 | Weights_l2 --> 11186.216 | Lr --> 0.003 | Seconds_per_step --> 6.424 | +[2024-09-27 06:27:46,797][Main][INFO] - [train] Step 14125 out of 20000 | Loss --> 1.765 | Grad_l2 --> 0.227 | Weights_l2 --> 11186.317 | Lr --> 0.003 | Seconds_per_step --> 6.359 | +[2024-09-27 06:30:26,001][Main][INFO] - [train] Step 14150 out of 20000 | Loss --> 1.774 | Grad_l2 --> 0.220 | Weights_l2 --> 11186.428 | Lr --> 0.003 | Seconds_per_step --> 6.368 | +[2024-09-27 06:33:04,910][Main][INFO] - [train] Step 14175 out of 20000 | Loss --> 1.770 | Grad_l2 --> 0.222 | Weights_l2 --> 11186.532 | Lr --> 0.003 | Seconds_per_step --> 6.356 | +[2024-09-27 06:35:43,971][Main][INFO] - [train] Step 14200 out of 20000 | Loss --> 1.763 | Grad_l2 --> 0.217 | Weights_l2 --> 11186.627 | Lr --> 0.003 | Seconds_per_step --> 6.362 | +[2024-09-27 06:38:24,547][Main][INFO] - [train] Step 14225 out of 20000 | Loss --> 1.772 | Grad_l2 --> 0.223 | Weights_l2 --> 11186.716 | Lr --> 0.003 | Seconds_per_step --> 6.423 | +[2024-09-27 06:41:03,494][Main][INFO] - [train] Step 14250 out of 20000 | Loss --> 1.765 | Grad_l2 --> 0.226 | Weights_l2 --> 11186.811 | Lr --> 0.003 | Seconds_per_step --> 6.358 | +[2024-09-27 06:43:42,572][Main][INFO] - [train] Step 14275 out of 20000 | Loss --> 1.757 | Grad_l2 --> 0.227 | Weights_l2 --> 11186.899 | Lr --> 0.003 | Seconds_per_step --> 6.363 | +[2024-09-27 06:46:21,722][Main][INFO] - [train] Step 14300 out of 20000 | Loss --> 1.765 | Grad_l2 --> 0.223 | Weights_l2 --> 11186.992 | Lr --> 0.003 | Seconds_per_step --> 6.366 | +[2024-09-27 06:49:02,385][Main][INFO] - [train] Step 14325 out of 20000 | Loss --> 1.752 | Grad_l2 --> 0.217 | Weights_l2 --> 11187.080 | Lr --> 0.003 | Seconds_per_step --> 6.426 | +[2024-09-27 06:51:41,302][Main][INFO] - [train] Step 14350 out of 20000 | Loss --> 1.765 | Grad_l2 --> 0.215 | Weights_l2 --> 11187.177 | Lr --> 0.003 | Seconds_per_step --> 6.357 | +[2024-09-27 06:54:20,188][Main][INFO] - [train] Step 14375 out of 20000 | Loss --> 1.754 | Grad_l2 --> 0.214 | Weights_l2 --> 11187.255 | Lr --> 0.003 | Seconds_per_step --> 6.355 | +[2024-09-27 06:56:59,187][Main][INFO] - [train] Step 14400 out of 20000 | Loss --> 1.748 | Grad_l2 --> 0.216 | Weights_l2 --> 11187.336 | Lr --> 0.003 | Seconds_per_step --> 6.360 | +[2024-09-27 06:59:39,340][Main][INFO] - [train] Step 14425 out of 20000 | Loss --> 1.758 | Grad_l2 --> 0.216 | Weights_l2 --> 11187.418 | Lr --> 0.003 | Seconds_per_step --> 6.406 | +[2024-09-27 07:02:18,402][Main][INFO] - [train] Step 14450 out of 20000 | Loss --> 1.752 | Grad_l2 --> 0.220 | Weights_l2 --> 11187.501 | Lr --> 0.003 | Seconds_per_step --> 6.362 | +[2024-09-27 07:04:57,485][Main][INFO] - [train] Step 14475 out of 20000 | Loss --> 1.763 | Grad_l2 --> 0.215 | Weights_l2 --> 11187.585 | Lr --> 0.003 | Seconds_per_step --> 6.363 | +[2024-09-27 07:07:36,515][Main][INFO] - [train] Step 14500 out of 20000 | Loss --> 1.763 | Grad_l2 --> 0.216 | Weights_l2 --> 11187.660 | Lr --> 0.003 | Seconds_per_step --> 6.361 | +[2024-09-27 07:10:17,191][Main][INFO] - [train] Step 14525 out of 20000 | Loss --> 1.758 | Grad_l2 --> 0.221 | Weights_l2 --> 11187.734 | Lr --> 0.003 | Seconds_per_step --> 6.427 | +[2024-09-27 07:12:56,357][Main][INFO] - [train] Step 14550 out of 20000 | Loss --> 1.758 | Grad_l2 --> 0.214 | Weights_l2 --> 11187.803 | Lr --> 0.003 | Seconds_per_step --> 6.367 | +[2024-09-27 07:15:35,243][Main][INFO] - [train] Step 14575 out of 20000 | Loss --> 1.743 | Grad_l2 --> 0.220 | Weights_l2 --> 11187.883 | Lr --> 0.003 | Seconds_per_step --> 6.355 | +[2024-09-27 07:18:14,207][Main][INFO] - [train] Step 14600 out of 20000 | Loss --> 1.758 | Grad_l2 --> 0.221 | Weights_l2 --> 11187.948 | Lr --> 0.003 | Seconds_per_step --> 6.358 | +[2024-09-27 07:20:54,992][Main][INFO] - [train] Step 14625 out of 20000 | Loss --> 1.751 | Grad_l2 --> 0.217 | Weights_l2 --> 11188.021 | Lr --> 0.003 | Seconds_per_step --> 6.431 | +[2024-09-27 07:23:34,364][Main][INFO] - [train] Step 14650 out of 20000 | Loss --> 1.750 | Grad_l2 --> 0.213 | Weights_l2 --> 11188.102 | Lr --> 0.003 | Seconds_per_step --> 6.375 | +[2024-09-27 07:26:13,416][Main][INFO] - [train] Step 14675 out of 20000 | Loss --> 1.750 | Grad_l2 --> 0.216 | Weights_l2 --> 11188.157 | Lr --> 0.003 | Seconds_per_step --> 6.362 | +[2024-09-27 07:28:52,313][Main][INFO] - [train] Step 14700 out of 20000 | Loss --> 1.752 | Grad_l2 --> 0.214 | Weights_l2 --> 11188.225 | Lr --> 0.003 | Seconds_per_step --> 6.356 | +[2024-09-27 07:31:31,149][Main][INFO] - [train] Step 14725 out of 20000 | Loss --> 1.756 | Grad_l2 --> 0.219 | Weights_l2 --> 11188.287 | Lr --> 0.003 | Seconds_per_step --> 6.353 | +[2024-09-27 07:34:11,824][Main][INFO] - [train] Step 14750 out of 20000 | Loss --> 1.755 | Grad_l2 --> 0.216 | Weights_l2 --> 11188.361 | Lr --> 0.003 | Seconds_per_step --> 6.427 | +[2024-09-27 07:36:51,084][Main][INFO] - [train] Step 14775 out of 20000 | Loss --> 1.749 | Grad_l2 --> 0.215 | Weights_l2 --> 11188.425 | Lr --> 0.003 | Seconds_per_step --> 6.370 | +[2024-09-27 07:39:30,152][Main][INFO] - [train] Step 14800 out of 20000 | Loss --> 1.747 | Grad_l2 --> 0.221 | Weights_l2 --> 11188.488 | Lr --> 0.003 | Seconds_per_step --> 6.363 | +[2024-09-27 07:42:09,378][Main][INFO] - [train] Step 14825 out of 20000 | Loss --> 1.738 | Grad_l2 --> 0.214 | Weights_l2 --> 11188.551 | Lr --> 0.003 | Seconds_per_step --> 6.369 | +[2024-09-27 07:44:50,207][Main][INFO] - [train] Step 14850 out of 20000 | Loss --> 1.761 | Grad_l2 --> 0.214 | Weights_l2 --> 11188.617 | Lr --> 0.003 | Seconds_per_step --> 6.433 | +[2024-09-27 07:47:28,891][Main][INFO] - [train] Step 14875 out of 20000 | Loss --> 1.749 | Grad_l2 --> 0.229 | Weights_l2 --> 11188.678 | Lr --> 0.003 | Seconds_per_step --> 6.347 | +[2024-09-27 07:50:08,027][Main][INFO] - [train] Step 14900 out of 20000 | Loss --> 1.754 | Grad_l2 --> 0.215 | Weights_l2 --> 11188.736 | Lr --> 0.003 | Seconds_per_step --> 6.365 | +[2024-09-27 07:52:47,117][Main][INFO] - [train] Step 14925 out of 20000 | Loss --> 1.744 | Grad_l2 --> 0.217 | Weights_l2 --> 11188.792 | Lr --> 0.003 | Seconds_per_step --> 6.364 | +[2024-09-27 07:55:27,743][Main][INFO] - [train] Step 14950 out of 20000 | Loss --> 1.743 | Grad_l2 --> 0.217 | Weights_l2 --> 11188.851 | Lr --> 0.003 | Seconds_per_step --> 6.425 | +[2024-09-27 07:58:06,992][Main][INFO] - [train] Step 14975 out of 20000 | Loss --> 1.741 | Grad_l2 --> 0.214 | Weights_l2 --> 11188.901 | Lr --> 0.003 | Seconds_per_step --> 6.370 | +[2024-09-27 08:00:45,706][Main][INFO] - [train] Step 15000 out of 20000 | Loss --> 1.745 | Grad_l2 --> 0.213 | Weights_l2 --> 11188.946 | Lr --> 0.003 | Seconds_per_step --> 6.348 | +[2024-09-27 08:00:45,707][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-15000 +[2024-09-27 08:00:45,714][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading +[2024-09-27 08:00:53,778][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-15000/model.safetensors +[2024-09-27 08:01:03,600][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-15000/optimizer.bin +[2024-09-27 08:01:03,601][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-15000/scheduler.bin +[2024-09-27 08:01:03,601][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-15000/sampler.bin +[2024-09-27 08:01:03,602][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-15000/sampler_1.bin +[2024-09-27 08:01:03,603][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-15000/random_states_0.pkl +[2024-09-27 08:03:42,448][Main][INFO] - [train] Step 15025 out of 20000 | Loss --> 1.735 | Grad_l2 --> 0.219 | Weights_l2 --> 11189.002 | Lr --> 0.002 | Seconds_per_step --> 7.070 | +[2024-09-27 08:06:22,738][Main][INFO] - [train] Step 15050 out of 20000 | Loss --> 1.745 | Grad_l2 --> 0.215 | Weights_l2 --> 11189.048 | Lr --> 0.002 | Seconds_per_step --> 6.412 | +[2024-09-27 08:09:01,795][Main][INFO] - [train] Step 15075 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.216 | Weights_l2 --> 11189.099 | Lr --> 0.002 | Seconds_per_step --> 6.362 | +[2024-09-27 08:11:40,894][Main][INFO] - [train] Step 15100 out of 20000 | Loss --> 1.747 | Grad_l2 --> 0.215 | Weights_l2 --> 11189.141 | Lr --> 0.002 | Seconds_per_step --> 6.364 | +[2024-09-27 08:14:20,424][Main][INFO] - [train] Step 15125 out of 20000 | Loss --> 1.737 | Grad_l2 --> 0.216 | Weights_l2 --> 11189.186 | Lr --> 0.002 | Seconds_per_step --> 6.381 | +[2024-09-27 08:16:59,572][Main][INFO] - [train] Step 15150 out of 20000 | Loss --> 1.736 | Grad_l2 --> 0.215 | Weights_l2 --> 11189.239 | Lr --> 0.002 | Seconds_per_step --> 6.366 | +[2024-09-27 08:19:40,520][Main][INFO] - [train] Step 15175 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.221 | Weights_l2 --> 11189.285 | Lr --> 0.002 | Seconds_per_step --> 6.438 | +[2024-09-27 08:22:19,633][Main][INFO] - [train] Step 15200 out of 20000 | Loss --> 1.731 | Grad_l2 --> 0.216 | Weights_l2 --> 11189.330 | Lr --> 0.002 | Seconds_per_step --> 6.364 | +[2024-09-27 08:24:58,493][Main][INFO] - [train] Step 15225 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.231 | Weights_l2 --> 11189.372 | Lr --> 0.002 | Seconds_per_step --> 6.354 | +[2024-09-27 08:27:37,656][Main][INFO] - [train] Step 15250 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.215 | Weights_l2 --> 11189.419 | Lr --> 0.002 | Seconds_per_step --> 6.366 | +[2024-09-27 08:30:18,265][Main][INFO] - [train] Step 15275 out of 20000 | Loss --> 1.743 | Grad_l2 --> 0.216 | Weights_l2 --> 11189.457 | Lr --> 0.002 | Seconds_per_step --> 6.424 | +[2024-09-27 08:32:57,046][Main][INFO] - [train] Step 15300 out of 20000 | Loss --> 1.736 | Grad_l2 --> 0.218 | Weights_l2 --> 11189.499 | Lr --> 0.002 | Seconds_per_step --> 6.351 | +[2024-09-27 08:35:35,932][Main][INFO] - [train] Step 15325 out of 20000 | Loss --> 1.733 | Grad_l2 --> 0.215 | Weights_l2 --> 11189.532 | Lr --> 0.002 | Seconds_per_step --> 6.355 | +[2024-09-27 08:38:15,012][Main][INFO] - [train] Step 15350 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.218 | Weights_l2 --> 11189.574 | Lr --> 0.002 | Seconds_per_step --> 6.363 | +[2024-09-27 08:40:55,684][Main][INFO] - [train] Step 15375 out of 20000 | Loss --> 1.738 | Grad_l2 --> 0.213 | Weights_l2 --> 11189.616 | Lr --> 0.002 | Seconds_per_step --> 6.427 | +[2024-09-27 08:43:34,647][Main][INFO] - [train] Step 15400 out of 20000 | Loss --> 1.736 | Grad_l2 --> 0.216 | Weights_l2 --> 11189.651 | Lr --> 0.002 | Seconds_per_step --> 6.358 | +[2024-09-27 08:46:13,657][Main][INFO] - [train] Step 15425 out of 20000 | Loss --> 1.733 | Grad_l2 --> 0.222 | Weights_l2 --> 11189.679 | Lr --> 0.002 | Seconds_per_step --> 6.360 | +[2024-09-27 08:48:52,557][Main][INFO] - [train] Step 15450 out of 20000 | Loss --> 1.740 | Grad_l2 --> 0.219 | Weights_l2 --> 11189.717 | Lr --> 0.002 | Seconds_per_step --> 6.356 | +[2024-09-27 08:51:33,073][Main][INFO] - [train] Step 15475 out of 20000 | Loss --> 1.735 | Grad_l2 --> 0.216 | Weights_l2 --> 11189.750 | Lr --> 0.002 | Seconds_per_step --> 6.421 | +[2024-09-27 08:54:12,071][Main][INFO] - [train] Step 15500 out of 20000 | Loss --> 1.731 | Grad_l2 --> 0.214 | Weights_l2 --> 11189.790 | Lr --> 0.002 | Seconds_per_step --> 6.360 | +[2024-09-27 08:56:51,354][Main][INFO] - [train] Step 15525 out of 20000 | Loss --> 1.737 | Grad_l2 --> 0.219 | Weights_l2 --> 11189.820 | Lr --> 0.002 | Seconds_per_step --> 6.371 | +[2024-09-27 08:59:30,616][Main][INFO] - [train] Step 15550 out of 20000 | Loss --> 1.735 | Grad_l2 --> 0.214 | Weights_l2 --> 11189.848 | Lr --> 0.002 | Seconds_per_step --> 6.370 | +[2024-09-27 09:02:09,738][Main][INFO] - [train] Step 15575 out of 20000 | Loss --> 1.737 | Grad_l2 --> 0.217 | Weights_l2 --> 11189.882 | Lr --> 0.002 | Seconds_per_step --> 6.365 | +[2024-09-27 09:04:50,378][Main][INFO] - [train] Step 15600 out of 20000 | Loss --> 1.738 | Grad_l2 --> 0.214 | Weights_l2 --> 11189.912 | Lr --> 0.002 | Seconds_per_step --> 6.426 | +[2024-09-27 09:07:29,349][Main][INFO] - [train] Step 15625 out of 20000 | Loss --> 1.735 | Grad_l2 --> 0.215 | Weights_l2 --> 11189.948 | Lr --> 0.002 | Seconds_per_step --> 6.359 | +[2024-09-27 09:10:08,346][Main][INFO] - [train] Step 15650 out of 20000 | Loss --> 1.740 | Grad_l2 --> 0.214 | Weights_l2 --> 11189.978 | Lr --> 0.002 | Seconds_per_step --> 6.360 | +[2024-09-27 09:12:47,425][Main][INFO] - [train] Step 15675 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.000 | Lr --> 0.002 | Seconds_per_step --> 6.363 | +[2024-09-27 09:15:27,678][Main][INFO] - [train] Step 15700 out of 20000 | Loss --> 1.731 | Grad_l2 --> 0.216 | Weights_l2 --> 11190.026 | Lr --> 0.002 | Seconds_per_step --> 6.410 | +[2024-09-27 09:18:06,649][Main][INFO] - [train] Step 15725 out of 20000 | Loss --> 1.738 | Grad_l2 --> 0.218 | Weights_l2 --> 11190.060 | Lr --> 0.002 | Seconds_per_step --> 6.359 | +[2024-09-27 09:20:45,622][Main][INFO] - [train] Step 15750 out of 20000 | Loss --> 1.742 | Grad_l2 --> 0.224 | Weights_l2 --> 11190.082 | Lr --> 0.002 | Seconds_per_step --> 6.359 | +[2024-09-27 09:23:24,723][Main][INFO] - [train] Step 15775 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.227 | Weights_l2 --> 11190.112 | Lr --> 0.002 | Seconds_per_step --> 6.364 | +[2024-09-27 09:26:05,301][Main][INFO] - [train] Step 15800 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.131 | Lr --> 0.002 | Seconds_per_step --> 6.423 | +[2024-09-27 09:28:44,274][Main][INFO] - [train] Step 15825 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.217 | Weights_l2 --> 11190.156 | Lr --> 0.002 | Seconds_per_step --> 6.359 | +[2024-09-27 09:31:23,677][Main][INFO] - [train] Step 15850 out of 20000 | Loss --> 1.741 | Grad_l2 --> 0.217 | Weights_l2 --> 11190.178 | Lr --> 0.002 | Seconds_per_step --> 6.376 | +[2024-09-27 09:34:03,033][Main][INFO] - [train] Step 15875 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.200 | Lr --> 0.002 | Seconds_per_step --> 6.374 | +[2024-09-27 09:36:43,719][Main][INFO] - [train] Step 15900 out of 20000 | Loss --> 1.729 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.223 | Lr --> 0.002 | Seconds_per_step --> 6.427 | +[2024-09-27 09:39:22,813][Main][INFO] - [train] Step 15925 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.216 | Weights_l2 --> 11190.246 | Lr --> 0.002 | Seconds_per_step --> 6.364 | +[2024-09-27 09:42:01,934][Main][INFO] - [train] Step 15950 out of 20000 | Loss --> 1.733 | Grad_l2 --> 0.215 | Weights_l2 --> 11190.270 | Lr --> 0.002 | Seconds_per_step --> 6.365 | +[2024-09-27 09:44:40,979][Main][INFO] - [train] Step 15975 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.218 | Weights_l2 --> 11190.288 | Lr --> 0.002 | Seconds_per_step --> 6.362 | +[2024-09-27 09:47:19,970][Main][INFO] - [train] Step 16000 out of 20000 | Loss --> 1.734 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.307 | Lr --> 0.002 | Seconds_per_step --> 6.360 | +[2024-09-27 09:50:00,612][Main][INFO] - [train] Step 16025 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.216 | Weights_l2 --> 11190.322 | Lr --> 0.002 | Seconds_per_step --> 6.426 | +[2024-09-27 09:52:39,606][Main][INFO] - [train] Step 16050 out of 20000 | Loss --> 1.729 | Grad_l2 --> 0.221 | Weights_l2 --> 11190.342 | Lr --> 0.002 | Seconds_per_step --> 6.360 | +[2024-09-27 09:55:18,484][Main][INFO] - [train] Step 16075 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.361 | Lr --> 0.002 | Seconds_per_step --> 6.355 | +[2024-09-27 09:57:57,410][Main][INFO] - [train] Step 16100 out of 20000 | Loss --> 1.736 | Grad_l2 --> 0.218 | Weights_l2 --> 11190.381 | Lr --> 0.002 | Seconds_per_step --> 6.357 | +[2024-09-27 10:00:38,135][Main][INFO] - [train] Step 16125 out of 20000 | Loss --> 1.729 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.396 | Lr --> 0.002 | Seconds_per_step --> 6.429 | +[2024-09-27 10:03:17,076][Main][INFO] - [train] Step 16150 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.413 | Lr --> 0.002 | Seconds_per_step --> 6.358 | +[2024-09-27 10:05:55,994][Main][INFO] - [train] Step 16175 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.423 | Lr --> 0.002 | Seconds_per_step --> 6.357 | +[2024-09-27 10:08:34,951][Main][INFO] - [train] Step 16200 out of 20000 | Loss --> 1.718 | Grad_l2 --> 0.219 | Weights_l2 --> 11190.442 | Lr --> 0.002 | Seconds_per_step --> 6.358 | +[2024-09-27 10:11:15,735][Main][INFO] - [train] Step 16225 out of 20000 | Loss --> 1.718 | Grad_l2 --> 0.223 | Weights_l2 --> 11190.457 | Lr --> 0.002 | Seconds_per_step --> 6.431 | +[2024-09-27 10:13:54,798][Main][INFO] - [train] Step 16250 out of 20000 | Loss --> 1.722 | Grad_l2 --> 0.216 | Weights_l2 --> 11190.473 | Lr --> 0.001 | Seconds_per_step --> 6.362 | +[2024-09-27 10:16:33,869][Main][INFO] - [train] Step 16275 out of 20000 | Loss --> 1.709 | Grad_l2 --> 0.217 | Weights_l2 --> 11190.490 | Lr --> 0.001 | Seconds_per_step --> 6.363 | +[2024-09-27 10:19:12,786][Main][INFO] - [train] Step 16300 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.215 | Weights_l2 --> 11190.499 | Lr --> 0.001 | Seconds_per_step --> 6.357 | +[2024-09-27 10:21:53,326][Main][INFO] - [train] Step 16325 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.218 | Weights_l2 --> 11190.511 | Lr --> 0.001 | Seconds_per_step --> 6.422 | +[2024-09-27 10:24:32,169][Main][INFO] - [train] Step 16350 out of 20000 | Loss --> 1.708 | Grad_l2 --> 0.215 | Weights_l2 --> 11190.526 | Lr --> 0.001 | Seconds_per_step --> 6.354 | +[2024-09-27 10:27:11,178][Main][INFO] - [train] Step 16375 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.542 | Lr --> 0.001 | Seconds_per_step --> 6.360 | +[2024-09-27 10:29:50,308][Main][INFO] - [train] Step 16400 out of 20000 | Loss --> 1.730 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.550 | Lr --> 0.001 | Seconds_per_step --> 6.365 | +[2024-09-27 10:32:29,385][Main][INFO] - [train] Step 16425 out of 20000 | Loss --> 1.710 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.559 | Lr --> 0.001 | Seconds_per_step --> 6.363 | +[2024-09-27 10:35:09,933][Main][INFO] - [train] Step 16450 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.569 | Lr --> 0.001 | Seconds_per_step --> 6.422 | +[2024-09-27 10:37:49,100][Main][INFO] - [train] Step 16475 out of 20000 | Loss --> 1.714 | Grad_l2 --> 0.215 | Weights_l2 --> 11190.581 | Lr --> 0.001 | Seconds_per_step --> 6.367 | +[2024-09-27 10:40:27,979][Main][INFO] - [train] Step 16500 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.215 | Weights_l2 --> 11190.588 | Lr --> 0.001 | Seconds_per_step --> 6.355 | +[2024-09-27 10:43:06,847][Main][INFO] - [train] Step 16525 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.217 | Weights_l2 --> 11190.597 | Lr --> 0.001 | Seconds_per_step --> 6.355 | +[2024-09-27 10:45:47,125][Main][INFO] - [train] Step 16550 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.602 | Lr --> 0.001 | Seconds_per_step --> 6.411 | +[2024-09-27 10:48:26,101][Main][INFO] - [train] Step 16575 out of 20000 | Loss --> 1.712 | Grad_l2 --> 0.216 | Weights_l2 --> 11190.608 | Lr --> 0.001 | Seconds_per_step --> 6.359 | +[2024-09-27 10:51:04,844][Main][INFO] - [train] Step 16600 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.616 | Lr --> 0.001 | Seconds_per_step --> 6.350 | +[2024-09-27 10:53:43,814][Main][INFO] - [train] Step 16625 out of 20000 | Loss --> 1.709 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.626 | Lr --> 0.001 | Seconds_per_step --> 6.359 | +[2024-09-27 10:56:24,306][Main][INFO] - [train] Step 16650 out of 20000 | Loss --> 1.706 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.632 | Lr --> 0.001 | Seconds_per_step --> 6.420 | +[2024-09-27 10:59:03,014][Main][INFO] - [train] Step 16675 out of 20000 | Loss --> 1.705 | Grad_l2 --> 0.219 | Weights_l2 --> 11190.640 | Lr --> 0.001 | Seconds_per_step --> 6.348 | +[2024-09-27 11:01:42,043][Main][INFO] - [train] Step 16700 out of 20000 | Loss --> 1.695 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.644 | Lr --> 0.001 | Seconds_per_step --> 6.361 | +[2024-09-27 11:04:20,983][Main][INFO] - [train] Step 16725 out of 20000 | Loss --> 1.716 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.649 | Lr --> 0.001 | Seconds_per_step --> 6.358 | +[2024-09-27 11:07:01,370][Main][INFO] - [train] Step 16750 out of 20000 | Loss --> 1.703 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.655 | Lr --> 0.001 | Seconds_per_step --> 6.415 | +[2024-09-27 11:09:40,438][Main][INFO] - [train] Step 16775 out of 20000 | Loss --> 1.714 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.660 | Lr --> 0.001 | Seconds_per_step --> 6.363 | +[2024-09-27 11:12:19,394][Main][INFO] - [train] Step 16800 out of 20000 | Loss --> 1.705 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.665 | Lr --> 0.001 | Seconds_per_step --> 6.358 | +[2024-09-27 11:14:58,435][Main][INFO] - [train] Step 16825 out of 20000 | Loss --> 1.702 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.664 | Lr --> 0.001 | Seconds_per_step --> 6.362 | +[2024-09-27 11:17:37,198][Main][INFO] - [train] Step 16850 out of 20000 | Loss --> 1.704 | Grad_l2 --> 0.215 | Weights_l2 --> 11190.669 | Lr --> 0.001 | Seconds_per_step --> 6.350 | +[2024-09-27 11:20:17,705][Main][INFO] - [train] Step 16875 out of 20000 | Loss --> 1.712 | Grad_l2 --> 0.216 | Weights_l2 --> 11190.679 | Lr --> 0.001 | Seconds_per_step --> 6.420 | +[2024-09-27 11:22:56,549][Main][INFO] - [train] Step 16900 out of 20000 | Loss --> 1.687 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.685 | Lr --> 0.001 | Seconds_per_step --> 6.354 | +[2024-09-27 11:25:35,455][Main][INFO] - [train] Step 16925 out of 20000 | Loss --> 1.689 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.686 | Lr --> 0.001 | Seconds_per_step --> 6.356 | +[2024-09-27 11:28:14,256][Main][INFO] - [train] Step 16950 out of 20000 | Loss --> 1.700 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.691 | Lr --> 0.001 | Seconds_per_step --> 6.352 | +[2024-09-27 11:30:54,748][Main][INFO] - [train] Step 16975 out of 20000 | Loss --> 1.688 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.695 | Lr --> 0.001 | Seconds_per_step --> 6.420 | +[2024-09-27 11:33:33,514][Main][INFO] - [train] Step 17000 out of 20000 | Loss --> 1.697 | Grad_l2 --> 0.219 | Weights_l2 --> 11190.701 | Lr --> 0.001 | Seconds_per_step --> 6.351 | +[2024-09-27 11:36:12,532][Main][INFO] - [train] Step 17025 out of 20000 | Loss --> 1.700 | Grad_l2 --> 0.215 | Weights_l2 --> 11190.704 | Lr --> 0.001 | Seconds_per_step --> 6.361 | +[2024-09-27 11:38:51,405][Main][INFO] - [train] Step 17050 out of 20000 | Loss --> 1.698 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.703 | Lr --> 0.001 | Seconds_per_step --> 6.355 | +[2024-09-27 11:41:31,997][Main][INFO] - [train] Step 17075 out of 20000 | Loss --> 1.693 | Grad_l2 --> 0.216 | Weights_l2 --> 11190.707 | Lr --> 0.001 | Seconds_per_step --> 6.424 | +[2024-09-27 11:44:10,920][Main][INFO] - [train] Step 17100 out of 20000 | Loss --> 1.694 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.706 | Lr --> 0.001 | Seconds_per_step --> 6.357 | +[2024-09-27 11:46:49,623][Main][INFO] - [train] Step 17125 out of 20000 | Loss --> 1.694 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.707 | Lr --> 0.001 | Seconds_per_step --> 6.348 | +[2024-09-27 11:49:28,622][Main][INFO] - [train] Step 17150 out of 20000 | Loss --> 1.686 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.707 | Lr --> 0.001 | Seconds_per_step --> 6.360 | +[2024-09-27 11:52:09,327][Main][INFO] - [train] Step 17175 out of 20000 | Loss --> 1.692 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.709 | Lr --> 0.001 | Seconds_per_step --> 6.428 | +[2024-09-27 11:54:48,386][Main][INFO] - [train] Step 17200 out of 20000 | Loss --> 1.693 | Grad_l2 --> 0.216 | Weights_l2 --> 11190.711 | Lr --> 0.001 | Seconds_per_step --> 6.362 | +[2024-09-27 11:57:27,812][Main][INFO] - [train] Step 17225 out of 20000 | Loss --> 1.677 | Grad_l2 --> 0.216 | Weights_l2 --> 11190.712 | Lr --> 0.001 | Seconds_per_step --> 6.377 | +[2024-09-27 12:00:07,180][Main][INFO] - [train] Step 17250 out of 20000 | Loss --> 1.697 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.717 | Lr --> 0.001 | Seconds_per_step --> 6.375 | +[2024-09-27 12:02:46,263][Main][INFO] - [train] Step 17275 out of 20000 | Loss --> 1.687 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.719 | Lr --> 0.001 | Seconds_per_step --> 6.363 | +[2024-09-27 12:05:26,849][Main][INFO] - [train] Step 17300 out of 20000 | Loss --> 1.685 | Grad_l2 --> 0.215 | Weights_l2 --> 11190.718 | Lr --> 0.001 | Seconds_per_step --> 6.423 | +[2024-09-27 12:08:06,159][Main][INFO] - [train] Step 17325 out of 20000 | Loss --> 1.683 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.717 | Lr --> 0.001 | Seconds_per_step --> 6.372 | +[2024-09-27 12:10:45,555][Main][INFO] - [train] Step 17350 out of 20000 | Loss --> 1.682 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.718 | Lr --> 0.001 | Seconds_per_step --> 6.376 | +[2024-09-27 12:13:24,729][Main][INFO] - [train] Step 17375 out of 20000 | Loss --> 1.685 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.719 | Lr --> 0.001 | Seconds_per_step --> 6.367 | +[2024-09-27 12:16:05,076][Main][INFO] - [train] Step 17400 out of 20000 | Loss --> 1.678 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.718 | Lr --> 0.001 | Seconds_per_step --> 6.414 | +[2024-09-27 12:18:44,023][Main][INFO] - [train] Step 17425 out of 20000 | Loss --> 1.673 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.720 | Lr --> 0.001 | Seconds_per_step --> 6.358 | +[2024-09-27 12:21:23,026][Main][INFO] - [train] Step 17450 out of 20000 | Loss --> 1.685 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.718 | Lr --> 0.001 | Seconds_per_step --> 6.360 | +[2024-09-27 12:24:01,750][Main][INFO] - [train] Step 17475 out of 20000 | Loss --> 1.687 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.719 | Lr --> 0.001 | Seconds_per_step --> 6.349 | +[2024-09-27 12:26:42,331][Main][INFO] - [train] Step 17500 out of 20000 | Loss --> 1.682 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.722 | Lr --> 0.001 | Seconds_per_step --> 6.423 | +[2024-09-27 12:26:42,332][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-17500 +[2024-09-27 12:26:42,340][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading +[2024-09-27 12:26:50,794][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-17500/model.safetensors +[2024-09-27 12:27:01,219][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-17500/optimizer.bin +[2024-09-27 12:27:01,222][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-17500/scheduler.bin +[2024-09-27 12:27:01,223][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-17500/sampler.bin +[2024-09-27 12:27:01,225][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-17500/sampler_1.bin +[2024-09-27 12:27:01,226][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-17500/random_states_0.pkl +[2024-09-27 12:29:40,023][Main][INFO] - [train] Step 17525 out of 20000 | Loss --> 1.677 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.724 | Lr --> 0.001 | Seconds_per_step --> 7.108 | +[2024-09-27 12:32:19,234][Main][INFO] - [train] Step 17550 out of 20000 | Loss --> 1.684 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.723 | Lr --> 0.001 | Seconds_per_step --> 6.368 | +[2024-09-27 12:34:58,039][Main][INFO] - [train] Step 17575 out of 20000 | Loss --> 1.660 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.723 | Lr --> 0.001 | Seconds_per_step --> 6.352 | +[2024-09-27 12:37:38,370][Main][INFO] - [train] Step 17600 out of 20000 | Loss --> 1.675 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.726 | Lr --> 0.001 | Seconds_per_step --> 6.413 | +[2024-09-27 12:40:16,941][Main][INFO] - [train] Step 17625 out of 20000 | Loss --> 1.676 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.724 | Lr --> 0.001 | Seconds_per_step --> 6.343 | +[2024-09-27 12:42:55,705][Main][INFO] - [train] Step 17650 out of 20000 | Loss --> 1.677 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.721 | Lr --> 0.001 | Seconds_per_step --> 6.350 | +[2024-09-27 12:45:34,287][Main][INFO] - [train] Step 17675 out of 20000 | Loss --> 1.692 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.719 | Lr --> 0.001 | Seconds_per_step --> 6.343 | +[2024-09-27 12:48:13,049][Main][INFO] - [train] Step 17700 out of 20000 | Loss --> 1.678 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.722 | Lr --> 0.001 | Seconds_per_step --> 6.350 | +[2024-09-27 12:50:52,998][Main][INFO] - [train] Step 17725 out of 20000 | Loss --> 1.667 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.719 | Lr --> 0.001 | Seconds_per_step --> 6.398 | +[2024-09-27 12:53:31,812][Main][INFO] - [train] Step 17750 out of 20000 | Loss --> 1.685 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.719 | Lr --> 0.001 | Seconds_per_step --> 6.353 | +[2024-09-27 12:56:10,933][Main][INFO] - [train] Step 17775 out of 20000 | Loss --> 1.691 | Grad_l2 --> 0.215 | Weights_l2 --> 11190.718 | Lr --> 0.001 | Seconds_per_step --> 6.365 | +[2024-09-27 12:58:49,937][Main][INFO] - [train] Step 17800 out of 20000 | Loss --> 1.680 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.717 | Lr --> 0.001 | Seconds_per_step --> 6.360 | +[2024-09-27 13:01:30,312][Main][INFO] - [train] Step 17825 out of 20000 | Loss --> 1.682 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.716 | Lr --> 0.001 | Seconds_per_step --> 6.415 | +[2024-09-27 13:04:09,036][Main][INFO] - [train] Step 17850 out of 20000 | Loss --> 1.689 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.716 | Lr --> 0.001 | Seconds_per_step --> 6.349 | +[2024-09-27 13:06:47,891][Main][INFO] - [train] Step 17875 out of 20000 | Loss --> 1.684 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.713 | Lr --> 0.001 | Seconds_per_step --> 6.354 | +[2024-09-27 13:09:26,845][Main][INFO] - [train] Step 17900 out of 20000 | Loss --> 1.675 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.713 | Lr --> 0.000 | Seconds_per_step --> 6.358 | +[2024-09-27 13:12:07,065][Main][INFO] - [train] Step 17925 out of 20000 | Loss --> 1.686 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.712 | Lr --> 0.000 | Seconds_per_step --> 6.409 | +[2024-09-27 13:14:45,823][Main][INFO] - [train] Step 17950 out of 20000 | Loss --> 1.684 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.710 | Lr --> 0.000 | Seconds_per_step --> 6.350 | +[2024-09-27 13:17:24,368][Main][INFO] - [train] Step 17975 out of 20000 | Loss --> 1.677 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.709 | Lr --> 0.000 | Seconds_per_step --> 6.342 | +[2024-09-27 13:20:03,093][Main][INFO] - [train] Step 18000 out of 20000 | Loss --> 1.697 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.708 | Lr --> 0.000 | Seconds_per_step --> 6.349 | +[2024-09-27 13:22:43,599][Main][INFO] - [train] Step 18025 out of 20000 | Loss --> 1.699 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.708 | Lr --> 0.000 | Seconds_per_step --> 6.420 | +[2024-09-27 13:25:22,461][Main][INFO] - [train] Step 18050 out of 20000 | Loss --> 1.685 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.706 | Lr --> 0.000 | Seconds_per_step --> 6.354 | +[2024-09-27 13:28:01,262][Main][INFO] - [train] Step 18075 out of 20000 | Loss --> 1.680 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.704 | Lr --> 0.000 | Seconds_per_step --> 6.352 | +[2024-09-27 13:30:39,920][Main][INFO] - [train] Step 18100 out of 20000 | Loss --> 1.687 | Grad_l2 --> 0.215 | Weights_l2 --> 11190.703 | Lr --> 0.000 | Seconds_per_step --> 6.346 | +[2024-09-27 13:33:20,444][Main][INFO] - [train] Step 18125 out of 20000 | Loss --> 1.682 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.702 | Lr --> 0.000 | Seconds_per_step --> 6.421 | +[2024-09-27 13:35:59,557][Main][INFO] - [train] Step 18150 out of 20000 | Loss --> 1.686 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.702 | Lr --> 0.000 | Seconds_per_step --> 6.364 | +[2024-09-27 13:38:38,879][Main][INFO] - [train] Step 18175 out of 20000 | Loss --> 1.676 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.698 | Lr --> 0.000 | Seconds_per_step --> 6.373 | +[2024-09-27 13:41:18,186][Main][INFO] - [train] Step 18200 out of 20000 | Loss --> 1.684 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.697 | Lr --> 0.000 | Seconds_per_step --> 6.372 | +[2024-09-27 13:43:56,953][Main][INFO] - [train] Step 18225 out of 20000 | Loss --> 1.682 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.697 | Lr --> 0.000 | Seconds_per_step --> 6.351 | +[2024-09-27 13:46:37,142][Main][INFO] - [train] Step 18250 out of 20000 | Loss --> 1.677 | Grad_l2 --> 0.215 | Weights_l2 --> 11190.695 | Lr --> 0.000 | Seconds_per_step --> 6.407 | +[2024-09-27 13:49:15,986][Main][INFO] - [train] Step 18275 out of 20000 | Loss --> 1.681 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.694 | Lr --> 0.000 | Seconds_per_step --> 6.354 | +[2024-09-27 13:51:54,861][Main][INFO] - [train] Step 18300 out of 20000 | Loss --> 1.680 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.692 | Lr --> 0.000 | Seconds_per_step --> 6.355 | +[2024-09-27 13:54:33,913][Main][INFO] - [train] Step 18325 out of 20000 | Loss --> 1.675 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.690 | Lr --> 0.000 | Seconds_per_step --> 6.362 | +[2024-09-27 13:57:14,429][Main][INFO] - [train] Step 18350 out of 20000 | Loss --> 1.686 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.689 | Lr --> 0.000 | Seconds_per_step --> 6.421 | +[2024-09-27 13:59:53,419][Main][INFO] - [train] Step 18375 out of 20000 | Loss --> 1.688 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.686 | Lr --> 0.000 | Seconds_per_step --> 6.359 | +[2024-09-27 14:02:32,420][Main][INFO] - [train] Step 18400 out of 20000 | Loss --> 1.686 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.684 | Lr --> 0.000 | Seconds_per_step --> 6.360 | +[2024-09-27 14:05:11,270][Main][INFO] - [train] Step 18425 out of 20000 | Loss --> 1.672 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.683 | Lr --> 0.000 | Seconds_per_step --> 6.354 | +[2024-09-27 14:07:51,397][Main][INFO] - [train] Step 18450 out of 20000 | Loss --> 1.680 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.681 | Lr --> 0.000 | Seconds_per_step --> 6.405 | +[2024-09-27 14:10:30,137][Main][INFO] - [train] Step 18475 out of 20000 | Loss --> 1.673 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.681 | Lr --> 0.000 | Seconds_per_step --> 6.350 | +[2024-09-27 14:13:08,898][Main][INFO] - [train] Step 18500 out of 20000 | Loss --> 1.679 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.679 | Lr --> 0.000 | Seconds_per_step --> 6.350 | +[2024-09-27 14:15:47,637][Main][INFO] - [train] Step 18525 out of 20000 | Loss --> 1.672 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.679 | Lr --> 0.000 | Seconds_per_step --> 6.349 | +[2024-09-27 14:18:27,768][Main][INFO] - [train] Step 18550 out of 20000 | Loss --> 1.686 | Grad_l2 --> 0.210 | Weights_l2 --> 11190.679 | Lr --> 0.000 | Seconds_per_step --> 6.405 | +[2024-09-27 14:21:06,399][Main][INFO] - [train] Step 18575 out of 20000 | Loss --> 1.684 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.677 | Lr --> 0.000 | Seconds_per_step --> 6.345 | +[2024-09-27 14:23:45,175][Main][INFO] - [train] Step 18600 out of 20000 | Loss --> 1.685 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.675 | Lr --> 0.000 | Seconds_per_step --> 6.351 | +[2024-09-27 14:26:23,892][Main][INFO] - [train] Step 18625 out of 20000 | Loss --> 1.688 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.672 | Lr --> 0.000 | Seconds_per_step --> 6.349 | +[2024-09-27 14:29:02,529][Main][INFO] - [train] Step 18650 out of 20000 | Loss --> 1.680 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.671 | Lr --> 0.000 | Seconds_per_step --> 6.345 | +[2024-09-27 14:31:42,996][Main][INFO] - [train] Step 18675 out of 20000 | Loss --> 1.691 | Grad_l2 --> 0.217 | Weights_l2 --> 11190.670 | Lr --> 0.000 | Seconds_per_step --> 6.419 | +[2024-09-27 14:34:21,851][Main][INFO] - [train] Step 18700 out of 20000 | Loss --> 1.681 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.670 | Lr --> 0.000 | Seconds_per_step --> 6.354 | +[2024-09-27 14:37:00,748][Main][INFO] - [train] Step 18725 out of 20000 | Loss --> 1.672 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.668 | Lr --> 0.000 | Seconds_per_step --> 6.356 | +[2024-09-27 14:39:39,264][Main][INFO] - [train] Step 18750 out of 20000 | Loss --> 1.683 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.668 | Lr --> 0.000 | Seconds_per_step --> 6.341 | +[2024-09-27 14:42:19,680][Main][INFO] - [train] Step 18775 out of 20000 | Loss --> 1.683 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.666 | Lr --> 0.000 | Seconds_per_step --> 6.417 | +[2024-09-27 14:44:58,257][Main][INFO] - [train] Step 18800 out of 20000 | Loss --> 1.675 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.665 | Lr --> 0.000 | Seconds_per_step --> 6.343 | +[2024-09-27 14:47:37,218][Main][INFO] - [train] Step 18825 out of 20000 | Loss --> 1.686 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.664 | Lr --> 0.000 | Seconds_per_step --> 6.358 | +[2024-09-27 14:50:16,057][Main][INFO] - [train] Step 18850 out of 20000 | Loss --> 1.684 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.663 | Lr --> 0.000 | Seconds_per_step --> 6.354 | +[2024-09-27 14:52:56,981][Main][INFO] - [train] Step 18875 out of 20000 | Loss --> 1.681 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.662 | Lr --> 0.000 | Seconds_per_step --> 6.437 | +[2024-09-27 14:55:35,831][Main][INFO] - [train] Step 18900 out of 20000 | Loss --> 1.686 | Grad_l2 --> 0.215 | Weights_l2 --> 11190.661 | Lr --> 0.000 | Seconds_per_step --> 6.354 | +[2024-09-27 14:58:14,513][Main][INFO] - [train] Step 18925 out of 20000 | Loss --> 1.667 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.660 | Lr --> 0.000 | Seconds_per_step --> 6.347 | +[2024-09-27 15:00:53,368][Main][INFO] - [train] Step 18950 out of 20000 | Loss --> 1.690 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.658 | Lr --> 0.000 | Seconds_per_step --> 6.354 | +[2024-09-27 15:03:33,537][Main][INFO] - [train] Step 18975 out of 20000 | Loss --> 1.679 | Grad_l2 --> 0.218 | Weights_l2 --> 11190.657 | Lr --> 0.000 | Seconds_per_step --> 6.407 | +[2024-09-27 15:06:12,258][Main][INFO] - [train] Step 19000 out of 20000 | Loss --> 1.678 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.657 | Lr --> 0.000 | Seconds_per_step --> 6.349 | +[2024-09-27 15:08:50,936][Main][INFO] - [train] Step 19025 out of 20000 | Loss --> 1.678 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.655 | Lr --> 0.000 | Seconds_per_step --> 6.347 | +[2024-09-27 15:11:29,702][Main][INFO] - [train] Step 19050 out of 20000 | Loss --> 1.679 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.655 | Lr --> 0.000 | Seconds_per_step --> 6.351 | +[2024-09-27 15:14:08,606][Main][INFO] - [train] Step 19075 out of 20000 | Loss --> 1.677 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.653 | Lr --> 0.000 | Seconds_per_step --> 6.356 | +[2024-09-27 15:16:49,280][Main][INFO] - [train] Step 19100 out of 20000 | Loss --> 1.689 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.652 | Lr --> 0.000 | Seconds_per_step --> 6.427 | +[2024-09-27 15:19:28,149][Main][INFO] - [train] Step 19125 out of 20000 | Loss --> 1.680 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.652 | Lr --> 0.000 | Seconds_per_step --> 6.355 | +[2024-09-27 15:22:07,000][Main][INFO] - [train] Step 19150 out of 20000 | Loss --> 1.671 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.651 | Lr --> 0.000 | Seconds_per_step --> 6.354 | +[2024-09-27 15:24:45,842][Main][INFO] - [train] Step 19175 out of 20000 | Loss --> 1.679 | Grad_l2 --> 0.210 | Weights_l2 --> 11190.651 | Lr --> 0.000 | Seconds_per_step --> 6.354 | +[2024-09-27 15:27:26,300][Main][INFO] - [train] Step 19200 out of 20000 | Loss --> 1.673 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.650 | Lr --> 0.000 | Seconds_per_step --> 6.418 | +[2024-09-27 15:30:05,121][Main][INFO] - [train] Step 19225 out of 20000 | Loss --> 1.678 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.649 | Lr --> 0.000 | Seconds_per_step --> 6.353 | +[2024-09-27 15:32:43,879][Main][INFO] - [train] Step 19250 out of 20000 | Loss --> 1.690 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.648 | Lr --> 0.000 | Seconds_per_step --> 6.350 | +[2024-09-27 15:35:22,705][Main][INFO] - [train] Step 19275 out of 20000 | Loss --> 1.680 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.648 | Lr --> 0.000 | Seconds_per_step --> 6.353 | +[2024-09-27 15:38:03,056][Main][INFO] - [train] Step 19300 out of 20000 | Loss --> 1.684 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.647 | Lr --> 0.000 | Seconds_per_step --> 6.414 | +[2024-09-27 15:40:41,923][Main][INFO] - [train] Step 19325 out of 20000 | Loss --> 1.683 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.646 | Lr --> 0.000 | Seconds_per_step --> 6.355 | +[2024-09-27 15:43:20,730][Main][INFO] - [train] Step 19350 out of 20000 | Loss --> 1.686 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.646 | Lr --> 0.000 | Seconds_per_step --> 6.352 | +[2024-09-27 15:45:59,520][Main][INFO] - [train] Step 19375 out of 20000 | Loss --> 1.666 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.645 | Lr --> 0.000 | Seconds_per_step --> 6.352 | +[2024-09-27 15:48:39,832][Main][INFO] - [train] Step 19400 out of 20000 | Loss --> 1.678 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.644 | Lr --> 0.000 | Seconds_per_step --> 6.412 | +[2024-09-27 15:51:18,675][Main][INFO] - [train] Step 19425 out of 20000 | Loss --> 1.670 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.644 | Lr --> 0.000 | Seconds_per_step --> 6.354 | +[2024-09-27 15:53:57,562][Main][INFO] - [train] Step 19450 out of 20000 | Loss --> 1.684 | Grad_l2 --> 0.210 | Weights_l2 --> 11190.643 | Lr --> 0.000 | Seconds_per_step --> 6.355 | +[2024-09-27 15:56:36,502][Main][INFO] - [train] Step 19475 out of 20000 | Loss --> 1.669 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.643 | Lr --> 0.000 | Seconds_per_step --> 6.357 | +[2024-09-27 15:59:16,851][Main][INFO] - [train] Step 19500 out of 20000 | Loss --> 1.681 | Grad_l2 --> 0.215 | Weights_l2 --> 11190.643 | Lr --> 0.000 | Seconds_per_step --> 6.414 | +[2024-09-27 16:01:55,784][Main][INFO] - [train] Step 19525 out of 20000 | Loss --> 1.678 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.643 | Lr --> 0.000 | Seconds_per_step --> 6.357 | +[2024-09-27 16:04:34,728][Main][INFO] - [train] Step 19550 out of 20000 | Loss --> 1.686 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.642 | Lr --> 0.000 | Seconds_per_step --> 6.358 | +[2024-09-27 16:07:13,650][Main][INFO] - [train] Step 19575 out of 20000 | Loss --> 1.674 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.642 | Lr --> 0.000 | Seconds_per_step --> 6.357 | +[2024-09-27 16:09:52,413][Main][INFO] - [train] Step 19600 out of 20000 | Loss --> 1.677 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.642 | Lr --> 0.000 | Seconds_per_step --> 6.350 | +[2024-09-27 16:12:32,775][Main][INFO] - [train] Step 19625 out of 20000 | Loss --> 1.677 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.642 | Lr --> 0.000 | Seconds_per_step --> 6.414 | +[2024-09-27 16:15:11,896][Main][INFO] - [train] Step 19650 out of 20000 | Loss --> 1.678 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.642 | Lr --> 0.000 | Seconds_per_step --> 6.365 | +[2024-09-27 16:17:50,847][Main][INFO] - [train] Step 19675 out of 20000 | Loss --> 1.671 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.641 | Lr --> 0.000 | Seconds_per_step --> 6.358 | +[2024-09-27 16:20:29,482][Main][INFO] - [train] Step 19700 out of 20000 | Loss --> 1.683 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.641 | Lr --> 0.000 | Seconds_per_step --> 6.345 | +[2024-09-27 16:23:09,770][Main][INFO] - [train] Step 19725 out of 20000 | Loss --> 1.675 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.641 | Lr --> 0.000 | Seconds_per_step --> 6.411 | +[2024-09-27 16:25:48,643][Main][INFO] - [train] Step 19750 out of 20000 | Loss --> 1.673 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.640 | Lr --> 0.000 | Seconds_per_step --> 6.355 | +[2024-09-27 16:28:27,353][Main][INFO] - [train] Step 19775 out of 20000 | Loss --> 1.672 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.640 | Lr --> 0.000 | Seconds_per_step --> 6.348 | +[2024-09-27 16:31:06,251][Main][INFO] - [train] Step 19800 out of 20000 | Loss --> 1.674 | Grad_l2 --> 0.213 | Weights_l2 --> 11190.640 | Lr --> 0.000 | Seconds_per_step --> 6.356 | +[2024-09-27 16:33:46,752][Main][INFO] - [train] Step 19825 out of 20000 | Loss --> 1.669 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.640 | Lr --> 0.000 | Seconds_per_step --> 6.420 | +[2024-09-27 16:36:25,583][Main][INFO] - [train] Step 19850 out of 20000 | Loss --> 1.671 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.640 | Lr --> 0.000 | Seconds_per_step --> 6.353 | +[2024-09-27 16:39:04,452][Main][INFO] - [train] Step 19875 out of 20000 | Loss --> 1.668 | Grad_l2 --> 0.212 | Weights_l2 --> 11190.640 | Lr --> 0.000 | Seconds_per_step --> 6.355 | +[2024-09-27 16:41:43,146][Main][INFO] - [train] Step 19900 out of 20000 | Loss --> 1.668 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.640 | Lr --> 0.000 | Seconds_per_step --> 6.348 | +[2024-09-27 16:44:21,864][Main][INFO] - [train] Step 19925 out of 20000 | Loss --> 1.675 | Grad_l2 --> 0.215 | Weights_l2 --> 11190.640 | Lr --> 0.000 | Seconds_per_step --> 6.349 | +[2024-09-27 16:47:02,132][Main][INFO] - [train] Step 19950 out of 20000 | Loss --> 1.677 | Grad_l2 --> 0.214 | Weights_l2 --> 11190.640 | Lr --> 0.000 | Seconds_per_step --> 6.411 | +[2024-09-27 16:49:40,711][Main][INFO] - [train] Step 19975 out of 20000 | Loss --> 1.679 | Grad_l2 --> 0.217 | Weights_l2 --> 11190.640 | Lr --> 0.000 | Seconds_per_step --> 6.343 | +[2024-09-27 16:52:19,469][Main][INFO] - [train] Step 20000 out of 20000 | Loss --> 1.675 | Grad_l2 --> 0.211 | Weights_l2 --> 11190.639 | Lr --> 0.000 | Seconds_per_step --> 6.350 | +[2024-09-27 16:52:19,470][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-20000 +[2024-09-27 16:52:19,477][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading +[2024-09-27 16:52:27,623][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-20000/model.safetensors +[2024-09-27 16:52:37,316][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-20000/optimizer.bin +[2024-09-27 16:52:37,318][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-20000/scheduler.bin +[2024-09-27 16:52:37,318][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-20000/sampler.bin +[2024-09-27 16:52:37,318][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-20000/sampler_1.bin +[2024-09-27 16:52:37,320][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-20000/random_states_0.pkl +[2024-09-27 16:52:40,867][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 16 (max is dataset.n_shards=8). Stopping 8 dataloader workers. +[2024-09-27 17:10:01,473][Main][INFO] - [eval] Step 20001 out of 20000 | Loss --> 2.033 | Accuracy --> 0.610 | Time --> 1042.286 | +[2024-09-27 17:10:01,478][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-20001 +[2024-09-27 17:10:01,486][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading +[2024-09-27 17:10:08,440][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-20001/model.safetensors +[2024-09-27 17:10:18,333][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-20001/optimizer.bin +[2024-09-27 17:10:18,335][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-20001/scheduler.bin +[2024-09-27 17:10:18,335][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-20001/sampler.bin +[2024-09-27 17:10:18,336][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-20001/sampler_1.bin +[2024-09-27 17:10:18,337][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-20001/random_states_0.pkl