[2024-08-30 19:59:22,813][Main][INFO] - Distributed environment: DistributedType.NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Mixed precision type: bf16 [2024-08-30 19:59:22,813][Main][INFO] - Working directory is /workspace/nanoT5/outputs/2024-08-30/19-59-22 [2024-08-30 20:06:09,361][Main][INFO] - [train] Step 25 out of 20000 | Loss --> 4.391 | Grad_l2 --> 2.207 | Weights_l2 --> 11056.723 | Lr --> 0.005 | Seconds_per_step --> 15.276 | [2024-08-30 20:08:14,532][Main][INFO] - [train] Step 50 out of 20000 | Loss --> 2.862 | Grad_l2 --> 1.035 | Weights_l2 --> 11056.373 | Lr --> 0.005 | Seconds_per_step --> 5.007 | [2024-08-30 20:10:16,613][Main][INFO] - [train] Step 75 out of 20000 | Loss --> 2.250 | Grad_l2 --> 0.494 | Weights_l2 --> 11056.160 | Lr --> 0.005 | Seconds_per_step --> 4.883 | [2024-08-30 20:12:18,421][Main][INFO] - [train] Step 100 out of 20000 | Loss --> 2.150 | Grad_l2 --> 0.397 | Weights_l2 --> 11055.979 | Lr --> 0.005 | Seconds_per_step --> 4.872 | [2024-08-30 20:14:21,230][Main][INFO] - [train] Step 125 out of 20000 | Loss --> 2.123 | Grad_l2 --> 0.386 | Weights_l2 --> 11055.813 | Lr --> 0.005 | Seconds_per_step --> 4.912 | [2024-08-30 20:16:22,809][Main][INFO] - [train] Step 150 out of 20000 | Loss --> 2.087 | Grad_l2 --> 0.343 | Weights_l2 --> 11055.659 | Lr --> 0.005 | Seconds_per_step --> 4.863 | [2024-08-30 20:18:24,499][Main][INFO] - [train] Step 175 out of 20000 | Loss --> 2.082 | Grad_l2 --> 0.344 | Weights_l2 --> 11055.517 | Lr --> 0.005 | Seconds_per_step --> 4.868 | [2024-08-30 20:20:27,639][Main][INFO] - [train] Step 200 out of 20000 | Loss --> 2.052 | Grad_l2 --> 0.376 | Weights_l2 --> 11055.392 | Lr --> 0.005 | Seconds_per_step --> 4.926 | [2024-08-30 20:22:29,317][Main][INFO] - [train] Step 225 out of 20000 | Loss --> 2.057 | Grad_l2 --> 0.339 | Weights_l2 --> 11055.279 | Lr --> 0.005 | Seconds_per_step --> 4.867 | [2024-08-30 20:24:30,866][Main][INFO] - [train] Step 250 out of 20000 | Loss --> 2.051 | Grad_l2 --> 0.321 | Weights_l2 --> 11055.177 | Lr --> 0.005 | Seconds_per_step --> 4.862 | [2024-08-30 20:26:33,835][Main][INFO] - [train] Step 275 out of 20000 | Loss --> 2.040 | Grad_l2 --> 0.324 | Weights_l2 --> 11055.071 | Lr --> 0.005 | Seconds_per_step --> 4.919 | [2024-08-30 20:28:35,215][Main][INFO] - [train] Step 300 out of 20000 | Loss --> 2.023 | Grad_l2 --> 0.312 | Weights_l2 --> 11054.978 | Lr --> 0.005 | Seconds_per_step --> 4.855 | [2024-08-30 20:30:36,882][Main][INFO] - [train] Step 325 out of 20000 | Loss --> 2.027 | Grad_l2 --> 0.310 | Weights_l2 --> 11054.902 | Lr --> 0.005 | Seconds_per_step --> 4.867 | [2024-08-30 20:32:38,372][Main][INFO] - [train] Step 350 out of 20000 | Loss --> 2.017 | Grad_l2 --> 0.310 | Weights_l2 --> 11054.828 | Lr --> 0.005 | Seconds_per_step --> 4.860 | [2024-08-30 20:34:41,599][Main][INFO] - [train] Step 375 out of 20000 | Loss --> 2.007 | Grad_l2 --> 0.317 | Weights_l2 --> 11054.757 | Lr --> 0.005 | Seconds_per_step --> 4.929 | [2024-08-30 20:36:42,927][Main][INFO] - [train] Step 400 out of 20000 | Loss --> 2.021 | Grad_l2 --> 0.310 | Weights_l2 --> 11054.697 | Lr --> 0.005 | Seconds_per_step --> 4.853 | [2024-08-30 20:38:44,347][Main][INFO] - [train] Step 425 out of 20000 | Loss --> 2.000 | Grad_l2 --> 0.311 | Weights_l2 --> 11054.646 | Lr --> 0.005 | Seconds_per_step --> 4.857 | [2024-08-30 20:40:47,234][Main][INFO] - [train] Step 450 out of 20000 | Loss --> 2.002 | Grad_l2 --> 0.318 | Weights_l2 --> 11054.594 | Lr --> 0.005 | Seconds_per_step --> 4.915 | [2024-08-30 20:42:48,664][Main][INFO] - [train] Step 475 out of 20000 | Loss --> 2.000 | Grad_l2 --> 0.308 | Weights_l2 --> 11054.566 | Lr --> 0.005 | Seconds_per_step --> 4.857 | [2024-08-30 20:44:50,113][Main][INFO] - [train] Step 500 out of 20000 | Loss --> 2.000 | Grad_l2 --> 0.294 | Weights_l2 --> 11054.528 | Lr --> 0.005 | Seconds_per_step --> 4.858 | [2024-08-30 20:46:54,282][Main][INFO] - [train] Step 525 out of 20000 | Loss --> 1.996 | Grad_l2 --> 0.299 | Weights_l2 --> 11054.503 | Lr --> 0.006 | Seconds_per_step --> 4.967 | [2024-08-30 20:48:55,732][Main][INFO] - [train] Step 550 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.289 | Weights_l2 --> 11054.484 | Lr --> 0.006 | Seconds_per_step --> 4.858 | [2024-08-30 20:50:57,238][Main][INFO] - [train] Step 575 out of 20000 | Loss --> 1.988 | Grad_l2 --> 0.300 | Weights_l2 --> 11054.467 | Lr --> 0.006 | Seconds_per_step --> 4.860 | [2024-08-30 20:53:00,169][Main][INFO] - [train] Step 600 out of 20000 | Loss --> 1.995 | Grad_l2 --> 0.293 | Weights_l2 --> 11054.458 | Lr --> 0.006 | Seconds_per_step --> 4.917 | [2024-08-30 20:55:01,602][Main][INFO] - [train] Step 625 out of 20000 | Loss --> 1.992 | Grad_l2 --> 0.292 | Weights_l2 --> 11054.449 | Lr --> 0.006 | Seconds_per_step --> 4.857 | [2024-08-30 20:57:02,932][Main][INFO] - [train] Step 650 out of 20000 | Loss --> 1.994 | Grad_l2 --> 0.295 | Weights_l2 --> 11054.451 | Lr --> 0.006 | Seconds_per_step --> 4.853 | [2024-08-30 20:59:05,725][Main][INFO] - [train] Step 675 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.289 | Weights_l2 --> 11054.454 | Lr --> 0.006 | Seconds_per_step --> 4.912 | [2024-08-30 21:01:07,152][Main][INFO] - [train] Step 700 out of 20000 | Loss --> 1.989 | Grad_l2 --> 0.280 | Weights_l2 --> 11054.474 | Lr --> 0.006 | Seconds_per_step --> 4.857 | [2024-08-30 21:03:08,697][Main][INFO] - [train] Step 725 out of 20000 | Loss --> 1.987 | Grad_l2 --> 0.293 | Weights_l2 --> 11054.492 | Lr --> 0.006 | Seconds_per_step --> 4.862 | [2024-08-30 21:05:10,526][Main][INFO] - [train] Step 750 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.284 | Weights_l2 --> 11054.517 | Lr --> 0.006 | Seconds_per_step --> 4.873 | [2024-08-30 21:07:13,559][Main][INFO] - [train] Step 775 out of 20000 | Loss --> 1.989 | Grad_l2 --> 0.288 | Weights_l2 --> 11054.554 | Lr --> 0.006 | Seconds_per_step --> 4.921 | [2024-08-30 21:09:14,948][Main][INFO] - [train] Step 800 out of 20000 | Loss --> 1.994 | Grad_l2 --> 0.286 | Weights_l2 --> 11054.591 | Lr --> 0.006 | Seconds_per_step --> 4.855 | [2024-08-30 21:11:16,204][Main][INFO] - [train] Step 825 out of 20000 | Loss --> 1.994 | Grad_l2 --> 0.289 | Weights_l2 --> 11054.632 | Lr --> 0.006 | Seconds_per_step --> 4.850 | [2024-08-30 21:13:19,020][Main][INFO] - [train] Step 850 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.282 | Weights_l2 --> 11054.682 | Lr --> 0.006 | Seconds_per_step --> 4.913 | [2024-08-30 21:15:20,394][Main][INFO] - [train] Step 875 out of 20000 | Loss --> 1.980 | Grad_l2 --> 0.295 | Weights_l2 --> 11054.735 | Lr --> 0.006 | Seconds_per_step --> 4.855 | [2024-08-30 21:17:22,021][Main][INFO] - [train] Step 900 out of 20000 | Loss --> 1.984 | Grad_l2 --> 0.279 | Weights_l2 --> 11054.776 | Lr --> 0.006 | Seconds_per_step --> 4.865 | [2024-08-30 21:19:25,138][Main][INFO] - [train] Step 925 out of 20000 | Loss --> 1.993 | Grad_l2 --> 0.278 | Weights_l2 --> 11054.839 | Lr --> 0.006 | Seconds_per_step --> 4.925 | [2024-08-30 21:21:26,547][Main][INFO] - [train] Step 950 out of 20000 | Loss --> 1.984 | Grad_l2 --> 0.281 | Weights_l2 --> 11054.902 | Lr --> 0.006 | Seconds_per_step --> 4.856 | [2024-08-30 21:23:28,173][Main][INFO] - [train] Step 975 out of 20000 | Loss --> 1.993 | Grad_l2 --> 0.291 | Weights_l2 --> 11054.965 | Lr --> 0.006 | Seconds_per_step --> 4.865 | [2024-08-30 21:25:31,788][Main][INFO] - [train] Step 1000 out of 20000 | Loss --> 1.981 | Grad_l2 --> 0.284 | Weights_l2 --> 11055.036 | Lr --> 0.006 | Seconds_per_step --> 4.945 | [2024-08-30 21:27:33,458][Main][INFO] - [train] Step 1025 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.289 | Weights_l2 --> 11055.115 | Lr --> 0.006 | Seconds_per_step --> 4.867 | [2024-08-30 21:29:35,109][Main][INFO] - [train] Step 1050 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.291 | Weights_l2 --> 11055.204 | Lr --> 0.006 | Seconds_per_step --> 4.866 | [2024-08-30 21:31:38,572][Main][INFO] - [train] Step 1075 out of 20000 | Loss --> 1.976 | Grad_l2 --> 0.280 | Weights_l2 --> 11055.287 | Lr --> 0.006 | Seconds_per_step --> 4.938 | [2024-08-30 21:33:40,356][Main][INFO] - [train] Step 1100 out of 20000 | Loss --> 1.978 | Grad_l2 --> 0.276 | Weights_l2 --> 11055.379 | Lr --> 0.006 | Seconds_per_step --> 4.871 | [2024-08-30 21:35:41,903][Main][INFO] - [train] Step 1125 out of 20000 | Loss --> 1.994 | Grad_l2 --> 0.291 | Weights_l2 --> 11055.470 | Lr --> 0.006 | Seconds_per_step --> 4.862 | [2024-08-30 21:37:45,131][Main][INFO] - [train] Step 1150 out of 20000 | Loss --> 1.973 | Grad_l2 --> 0.291 | Weights_l2 --> 11055.565 | Lr --> 0.006 | Seconds_per_step --> 4.929 | [2024-08-30 21:39:46,791][Main][INFO] - [train] Step 1175 out of 20000 | Loss --> 1.984 | Grad_l2 --> 0.276 | Weights_l2 --> 11055.661 | Lr --> 0.006 | Seconds_per_step --> 4.866 | [2024-08-30 21:41:48,783][Main][INFO] - [train] Step 1200 out of 20000 | Loss --> 1.978 | Grad_l2 --> 0.281 | Weights_l2 --> 11055.773 | Lr --> 0.006 | Seconds_per_step --> 4.880 | [2024-08-30 21:43:50,674][Main][INFO] - [train] Step 1225 out of 20000 | Loss --> 1.988 | Grad_l2 --> 0.282 | Weights_l2 --> 11055.884 | Lr --> 0.006 | Seconds_per_step --> 4.876 | [2024-08-30 21:45:53,495][Main][INFO] - [train] Step 1250 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.281 | Weights_l2 --> 11056.005 | Lr --> 0.006 | Seconds_per_step --> 4.913 | [2024-08-30 21:47:55,086][Main][INFO] - [train] Step 1275 out of 20000 | Loss --> 1.989 | Grad_l2 --> 0.278 | Weights_l2 --> 11056.124 | Lr --> 0.006 | Seconds_per_step --> 4.864 | [2024-08-30 21:49:56,602][Main][INFO] - [train] Step 1300 out of 20000 | Loss --> 1.978 | Grad_l2 --> 0.276 | Weights_l2 --> 11056.241 | Lr --> 0.006 | Seconds_per_step --> 4.861 | [2024-08-30 21:51:59,919][Main][INFO] - [train] Step 1325 out of 20000 | Loss --> 1.975 | Grad_l2 --> 0.285 | Weights_l2 --> 11056.366 | Lr --> 0.006 | Seconds_per_step --> 4.933 | [2024-08-30 21:54:01,543][Main][INFO] - [train] Step 1350 out of 20000 | Loss --> 1.977 | Grad_l2 --> 0.281 | Weights_l2 --> 11056.496 | Lr --> 0.006 | Seconds_per_step --> 4.865 | [2024-08-30 21:56:03,378][Main][INFO] - [train] Step 1375 out of 20000 | Loss --> 1.964 | Grad_l2 --> 0.268 | Weights_l2 --> 11056.630 | Lr --> 0.006 | Seconds_per_step --> 4.873 | [2024-08-30 21:58:06,956][Main][INFO] - [train] Step 1400 out of 20000 | Loss --> 1.983 | Grad_l2 --> 0.275 | Weights_l2 --> 11056.772 | Lr --> 0.006 | Seconds_per_step --> 4.943 | [2024-08-30 22:00:09,383][Main][INFO] - [train] Step 1425 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.278 | Weights_l2 --> 11056.933 | Lr --> 0.006 | Seconds_per_step --> 4.897 | [2024-08-30 22:02:11,501][Main][INFO] - [train] Step 1450 out of 20000 | Loss --> 1.987 | Grad_l2 --> 0.273 | Weights_l2 --> 11057.076 | Lr --> 0.006 | Seconds_per_step --> 4.885 | [2024-08-30 22:04:14,897][Main][INFO] - [train] Step 1475 out of 20000 | Loss --> 1.975 | Grad_l2 --> 0.274 | Weights_l2 --> 11057.230 | Lr --> 0.006 | Seconds_per_step --> 4.936 | [2024-08-30 22:06:16,602][Main][INFO] - [train] Step 1500 out of 20000 | Loss --> 1.988 | Grad_l2 --> 0.285 | Weights_l2 --> 11057.386 | Lr --> 0.006 | Seconds_per_step --> 4.868 | [2024-08-30 22:08:18,616][Main][INFO] - [train] Step 1525 out of 20000 | Loss --> 1.991 | Grad_l2 --> 0.276 | Weights_l2 --> 11057.546 | Lr --> 0.007 | Seconds_per_step --> 4.880 | [2024-08-30 22:10:21,835][Main][INFO] - [train] Step 1550 out of 20000 | Loss --> 1.973 | Grad_l2 --> 0.292 | Weights_l2 --> 11057.711 | Lr --> 0.007 | Seconds_per_step --> 4.929 | [2024-08-30 22:12:23,457][Main][INFO] - [train] Step 1575 out of 20000 | Loss --> 1.979 | Grad_l2 --> 0.273 | Weights_l2 --> 11057.883 | Lr --> 0.007 | Seconds_per_step --> 4.865 | [2024-08-30 22:14:25,060][Main][INFO] - [train] Step 1600 out of 20000 | Loss --> 1.967 | Grad_l2 --> 0.273 | Weights_l2 --> 11058.050 | Lr --> 0.007 | Seconds_per_step --> 4.864 | [2024-08-30 22:16:26,524][Main][INFO] - [train] Step 1625 out of 20000 | Loss --> 1.981 | Grad_l2 --> 0.268 | Weights_l2 --> 11058.218 | Lr --> 0.007 | Seconds_per_step --> 4.858 | [2024-08-30 22:18:29,600][Main][INFO] - [train] Step 1650 out of 20000 | Loss --> 1.980 | Grad_l2 --> 0.277 | Weights_l2 --> 11058.394 | Lr --> 0.007 | Seconds_per_step --> 4.923 | [2024-08-30 22:20:31,304][Main][INFO] - [train] Step 1675 out of 20000 | Loss --> 1.984 | Grad_l2 --> 0.275 | Weights_l2 --> 11058.590 | Lr --> 0.007 | Seconds_per_step --> 4.868 | [2024-08-30 22:22:32,899][Main][INFO] - [train] Step 1700 out of 20000 | Loss --> 1.973 | Grad_l2 --> 0.261 | Weights_l2 --> 11058.772 | Lr --> 0.007 | Seconds_per_step --> 4.864 | [2024-08-30 22:24:35,876][Main][INFO] - [train] Step 1725 out of 20000 | Loss --> 1.978 | Grad_l2 --> 0.273 | Weights_l2 --> 11058.955 | Lr --> 0.007 | Seconds_per_step --> 4.919 | [2024-08-30 22:26:37,549][Main][INFO] - [train] Step 1750 out of 20000 | Loss --> 1.994 | Grad_l2 --> 0.279 | Weights_l2 --> 11059.163 | Lr --> 0.007 | Seconds_per_step --> 4.867 | [2024-08-30 22:28:39,011][Main][INFO] - [train] Step 1775 out of 20000 | Loss --> 1.968 | Grad_l2 --> 0.288 | Weights_l2 --> 11059.360 | Lr --> 0.007 | Seconds_per_step --> 4.858 | [2024-08-30 22:30:42,207][Main][INFO] - [train] Step 1800 out of 20000 | Loss --> 1.992 | Grad_l2 --> 0.271 | Weights_l2 --> 11059.569 | Lr --> 0.007 | Seconds_per_step --> 4.928 | [2024-08-30 22:32:43,844][Main][INFO] - [train] Step 1825 out of 20000 | Loss --> 1.978 | Grad_l2 --> 0.261 | Weights_l2 --> 11059.786 | Lr --> 0.007 | Seconds_per_step --> 4.865 | [2024-08-30 22:34:45,353][Main][INFO] - [train] Step 1850 out of 20000 | Loss --> 1.981 | Grad_l2 --> 0.270 | Weights_l2 --> 11059.993 | Lr --> 0.007 | Seconds_per_step --> 4.860 | [2024-08-30 22:36:48,555][Main][INFO] - [train] Step 1875 out of 20000 | Loss --> 1.978 | Grad_l2 --> 0.278 | Weights_l2 --> 11060.217 | Lr --> 0.007 | Seconds_per_step --> 4.928 | [2024-08-30 22:38:49,998][Main][INFO] - [train] Step 1900 out of 20000 | Loss --> 1.981 | Grad_l2 --> 0.270 | Weights_l2 --> 11060.438 | Lr --> 0.007 | Seconds_per_step --> 4.858 | [2024-08-30 22:40:51,502][Main][INFO] - [train] Step 1925 out of 20000 | Loss --> 1.984 | Grad_l2 --> 0.273 | Weights_l2 --> 11060.667 | Lr --> 0.007 | Seconds_per_step --> 4.860 | [2024-08-30 22:42:54,659][Main][INFO] - [train] Step 1950 out of 20000 | Loss --> 1.980 | Grad_l2 --> 0.277 | Weights_l2 --> 11060.890 | Lr --> 0.007 | Seconds_per_step --> 4.926 | [2024-08-30 22:44:56,169][Main][INFO] - [train] Step 1975 out of 20000 | Loss --> 1.972 | Grad_l2 --> 0.283 | Weights_l2 --> 11061.121 | Lr --> 0.007 | Seconds_per_step --> 4.860 | [2024-08-30 22:46:57,716][Main][INFO] - [train] Step 2000 out of 20000 | Loss --> 1.975 | Grad_l2 --> 0.276 | Weights_l2 --> 11061.353 | Lr --> 0.007 | Seconds_per_step --> 4.862 | [2024-08-30 22:49:00,871][Main][INFO] - [train] Step 2025 out of 20000 | Loss --> 1.979 | Grad_l2 --> 0.299 | Weights_l2 --> 11061.589 | Lr --> 0.007 | Seconds_per_step --> 4.926 | [2024-08-30 22:51:02,499][Main][INFO] - [train] Step 2050 out of 20000 | Loss --> 1.992 | Grad_l2 --> 0.317 | Weights_l2 --> 11061.853 | Lr --> 0.007 | Seconds_per_step --> 4.865 | [2024-08-30 22:53:03,994][Main][INFO] - [train] Step 2075 out of 20000 | Loss --> 1.973 | Grad_l2 --> 0.311 | Weights_l2 --> 11062.136 | Lr --> 0.007 | Seconds_per_step --> 4.860 | [2024-08-30 22:55:05,663][Main][INFO] - [train] Step 2100 out of 20000 | Loss --> 1.990 | Grad_l2 --> 0.285 | Weights_l2 --> 11062.396 | Lr --> 0.007 | Seconds_per_step --> 4.867 | [2024-08-30 22:57:08,808][Main][INFO] - [train] Step 2125 out of 20000 | Loss --> 1.990 | Grad_l2 --> 0.270 | Weights_l2 --> 11062.658 | Lr --> 0.007 | Seconds_per_step --> 4.926 | [2024-08-30 22:59:10,359][Main][INFO] - [train] Step 2150 out of 20000 | Loss --> 1.975 | Grad_l2 --> 0.267 | Weights_l2 --> 11062.926 | Lr --> 0.007 | Seconds_per_step --> 4.862 | [2024-08-30 23:01:11,917][Main][INFO] - [train] Step 2175 out of 20000 | Loss --> 1.977 | Grad_l2 --> 0.267 | Weights_l2 --> 11063.190 | Lr --> 0.007 | Seconds_per_step --> 4.862 | [2024-08-30 23:03:14,975][Main][INFO] - [train] Step 2200 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.266 | Weights_l2 --> 11063.461 | Lr --> 0.007 | Seconds_per_step --> 4.922 | [2024-08-30 23:05:16,721][Main][INFO] - [train] Step 2225 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.272 | Weights_l2 --> 11063.731 | Lr --> 0.007 | Seconds_per_step --> 4.870 | [2024-08-30 23:07:18,331][Main][INFO] - [train] Step 2250 out of 20000 | Loss --> 1.978 | Grad_l2 --> 0.272 | Weights_l2 --> 11064.006 | Lr --> 0.007 | Seconds_per_step --> 4.864 | [2024-08-30 23:09:21,568][Main][INFO] - [train] Step 2275 out of 20000 | Loss --> 1.975 | Grad_l2 --> 0.289 | Weights_l2 --> 11064.290 | Lr --> 0.007 | Seconds_per_step --> 4.929 | [2024-08-30 23:11:23,165][Main][INFO] - [train] Step 2300 out of 20000 | Loss --> 1.973 | Grad_l2 --> 0.266 | Weights_l2 --> 11064.585 | Lr --> 0.007 | Seconds_per_step --> 4.864 | [2024-08-30 23:13:24,793][Main][INFO] - [train] Step 2325 out of 20000 | Loss --> 1.991 | Grad_l2 --> 0.266 | Weights_l2 --> 11064.868 | Lr --> 0.007 | Seconds_per_step --> 4.865 | [2024-08-30 23:15:27,683][Main][INFO] - [train] Step 2350 out of 20000 | Loss --> 1.972 | Grad_l2 --> 0.272 | Weights_l2 --> 11065.159 | Lr --> 0.007 | Seconds_per_step --> 4.915 | [2024-08-30 23:17:29,152][Main][INFO] - [train] Step 2375 out of 20000 | Loss --> 1.979 | Grad_l2 --> 0.268 | Weights_l2 --> 11065.443 | Lr --> 0.007 | Seconds_per_step --> 4.859 | [2024-08-30 23:19:30,809][Main][INFO] - [train] Step 2400 out of 20000 | Loss --> 1.989 | Grad_l2 --> 0.271 | Weights_l2 --> 11065.744 | Lr --> 0.007 | Seconds_per_step --> 4.866 | [2024-08-30 23:21:33,838][Main][INFO] - [train] Step 2425 out of 20000 | Loss --> 1.991 | Grad_l2 --> 0.270 | Weights_l2 --> 11066.038 | Lr --> 0.007 | Seconds_per_step --> 4.921 | [2024-08-30 23:23:35,410][Main][INFO] - [train] Step 2450 out of 20000 | Loss --> 1.966 | Grad_l2 --> 0.265 | Weights_l2 --> 11066.358 | Lr --> 0.007 | Seconds_per_step --> 4.863 | [2024-08-30 23:25:36,763][Main][INFO] - [train] Step 2475 out of 20000 | Loss --> 1.997 | Grad_l2 --> 0.265 | Weights_l2 --> 11066.673 | Lr --> 0.007 | Seconds_per_step --> 4.854 | [2024-08-30 23:27:40,029][Main][INFO] - [train] Step 2500 out of 20000 | Loss --> 1.987 | Grad_l2 --> 0.260 | Weights_l2 --> 11067.003 | Lr --> 0.007 | Seconds_per_step --> 4.931 | [2024-08-30 23:27:40,030][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-2500 [2024-08-30 23:27:40,037][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-30 23:27:46,340][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-2500/model.safetensors [2024-08-30 23:27:54,633][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-2500/optimizer.bin [2024-08-30 23:27:54,635][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-2500/scheduler.bin [2024-08-30 23:27:54,635][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-2500/sampler.bin [2024-08-30 23:27:54,636][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-2500/sampler_1.bin [2024-08-30 23:27:54,637][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-2500/random_states_0.pkl [2024-08-30 23:29:55,808][Main][INFO] - [train] Step 2525 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.257 | Weights_l2 --> 11067.308 | Lr --> 0.008 | Seconds_per_step --> 5.431 | [2024-08-30 23:31:57,180][Main][INFO] - [train] Step 2550 out of 20000 | Loss --> 1.992 | Grad_l2 --> 0.265 | Weights_l2 --> 11067.622 | Lr --> 0.008 | Seconds_per_step --> 4.855 | [2024-08-30 23:33:58,748][Main][INFO] - [train] Step 2575 out of 20000 | Loss --> 1.995 | Grad_l2 --> 0.267 | Weights_l2 --> 11067.941 | Lr --> 0.008 | Seconds_per_step --> 4.863 | [2024-08-30 23:36:01,811][Main][INFO] - [train] Step 2600 out of 20000 | Loss --> 1.988 | Grad_l2 --> 0.265 | Weights_l2 --> 11068.275 | Lr --> 0.008 | Seconds_per_step --> 4.922 | [2024-08-30 23:38:03,659][Main][INFO] - [train] Step 2625 out of 20000 | Loss --> 1.999 | Grad_l2 --> 0.268 | Weights_l2 --> 11068.607 | Lr --> 0.008 | Seconds_per_step --> 4.874 | [2024-08-30 23:40:05,277][Main][INFO] - [train] Step 2650 out of 20000 | Loss --> 1.992 | Grad_l2 --> 0.273 | Weights_l2 --> 11068.953 | Lr --> 0.008 | Seconds_per_step --> 4.865 | [2024-08-30 23:42:08,293][Main][INFO] - [train] Step 2675 out of 20000 | Loss --> 1.987 | Grad_l2 --> 0.261 | Weights_l2 --> 11069.294 | Lr --> 0.008 | Seconds_per_step --> 4.921 | [2024-08-30 23:44:09,983][Main][INFO] - [train] Step 2700 out of 20000 | Loss --> 1.989 | Grad_l2 --> 0.265 | Weights_l2 --> 11069.640 | Lr --> 0.008 | Seconds_per_step --> 4.867 | [2024-08-30 23:46:11,536][Main][INFO] - [train] Step 2725 out of 20000 | Loss --> 1.997 | Grad_l2 --> 0.264 | Weights_l2 --> 11069.995 | Lr --> 0.008 | Seconds_per_step --> 4.862 | [2024-08-30 23:48:14,785][Main][INFO] - [train] Step 2750 out of 20000 | Loss --> 1.995 | Grad_l2 --> 0.276 | Weights_l2 --> 11070.357 | Lr --> 0.008 | Seconds_per_step --> 4.930 | [2024-08-30 23:50:16,364][Main][INFO] - [train] Step 2775 out of 20000 | Loss --> 1.988 | Grad_l2 --> 0.266 | Weights_l2 --> 11070.713 | Lr --> 0.008 | Seconds_per_step --> 4.863 | [2024-08-30 23:52:17,766][Main][INFO] - [train] Step 2800 out of 20000 | Loss --> 1.994 | Grad_l2 --> 0.262 | Weights_l2 --> 11071.068 | Lr --> 0.008 | Seconds_per_step --> 4.856 | [2024-08-30 23:54:20,997][Main][INFO] - [train] Step 2825 out of 20000 | Loss --> 1.987 | Grad_l2 --> 0.264 | Weights_l2 --> 11071.444 | Lr --> 0.008 | Seconds_per_step --> 4.929 | [2024-08-30 23:56:22,682][Main][INFO] - [train] Step 2850 out of 20000 | Loss --> 1.996 | Grad_l2 --> 0.255 | Weights_l2 --> 11071.815 | Lr --> 0.008 | Seconds_per_step --> 4.867 | [2024-08-30 23:58:24,386][Main][INFO] - [train] Step 2875 out of 20000 | Loss --> 1.990 | Grad_l2 --> 0.264 | Weights_l2 --> 11072.191 | Lr --> 0.008 | Seconds_per_step --> 4.868 | [2024-08-31 00:00:27,429][Main][INFO] - [train] Step 2900 out of 20000 | Loss --> 1.980 | Grad_l2 --> 0.269 | Weights_l2 --> 11072.549 | Lr --> 0.008 | Seconds_per_step --> 4.922 | [2024-08-31 00:02:29,033][Main][INFO] - [train] Step 2925 out of 20000 | Loss --> 1.978 | Grad_l2 --> 0.267 | Weights_l2 --> 11072.930 | Lr --> 0.008 | Seconds_per_step --> 4.864 | [2024-08-31 00:04:30,570][Main][INFO] - [train] Step 2950 out of 20000 | Loss --> 1.990 | Grad_l2 --> 0.265 | Weights_l2 --> 11073.307 | Lr --> 0.008 | Seconds_per_step --> 4.861 | [2024-08-31 00:06:33,773][Main][INFO] - [train] Step 2975 out of 20000 | Loss --> 1.998 | Grad_l2 --> 0.266 | Weights_l2 --> 11073.701 | Lr --> 0.008 | Seconds_per_step --> 4.928 | [2024-08-31 00:08:35,245][Main][INFO] - [train] Step 3000 out of 20000 | Loss --> 2.003 | Grad_l2 --> 0.260 | Weights_l2 --> 11074.093 | Lr --> 0.008 | Seconds_per_step --> 4.859 | [2024-08-31 00:10:36,620][Main][INFO] - [train] Step 3025 out of 20000 | Loss --> 1.992 | Grad_l2 --> 0.258 | Weights_l2 --> 11074.474 | Lr --> 0.008 | Seconds_per_step --> 4.855 | [2024-08-31 00:12:38,050][Main][INFO] - [train] Step 3050 out of 20000 | Loss --> 1.994 | Grad_l2 --> 0.259 | Weights_l2 --> 11074.878 | Lr --> 0.008 | Seconds_per_step --> 4.857 | [2024-08-31 00:14:41,198][Main][INFO] - [train] Step 3075 out of 20000 | Loss --> 1.982 | Grad_l2 --> 0.255 | Weights_l2 --> 11075.266 | Lr --> 0.008 | Seconds_per_step --> 4.926 | [2024-08-31 00:16:42,781][Main][INFO] - [train] Step 3100 out of 20000 | Loss --> 2.015 | Grad_l2 --> 0.263 | Weights_l2 --> 11075.677 | Lr --> 0.008 | Seconds_per_step --> 4.863 | [2024-08-31 00:18:44,360][Main][INFO] - [train] Step 3125 out of 20000 | Loss --> 2.004 | Grad_l2 --> 0.263 | Weights_l2 --> 11076.094 | Lr --> 0.008 | Seconds_per_step --> 4.863 | [2024-08-31 00:20:47,387][Main][INFO] - [train] Step 3150 out of 20000 | Loss --> 1.999 | Grad_l2 --> 0.264 | Weights_l2 --> 11076.505 | Lr --> 0.008 | Seconds_per_step --> 4.921 | [2024-08-31 00:22:48,826][Main][INFO] - [train] Step 3175 out of 20000 | Loss --> 1.993 | Grad_l2 --> 0.261 | Weights_l2 --> 11076.927 | Lr --> 0.008 | Seconds_per_step --> 4.857 | [2024-08-31 00:24:50,393][Main][INFO] - [train] Step 3200 out of 20000 | Loss --> 1.995 | Grad_l2 --> 0.255 | Weights_l2 --> 11077.361 | Lr --> 0.008 | Seconds_per_step --> 4.863 | [2024-08-31 00:26:53,506][Main][INFO] - [train] Step 3225 out of 20000 | Loss --> 1.994 | Grad_l2 --> 0.256 | Weights_l2 --> 11077.799 | Lr --> 0.008 | Seconds_per_step --> 4.924 | [2024-08-31 00:28:54,971][Main][INFO] - [train] Step 3250 out of 20000 | Loss --> 1.999 | Grad_l2 --> 0.254 | Weights_l2 --> 11078.230 | Lr --> 0.008 | Seconds_per_step --> 4.859 | [2024-08-31 00:30:56,451][Main][INFO] - [train] Step 3275 out of 20000 | Loss --> 1.977 | Grad_l2 --> 0.250 | Weights_l2 --> 11078.643 | Lr --> 0.008 | Seconds_per_step --> 4.859 | [2024-08-31 00:32:59,745][Main][INFO] - [train] Step 3300 out of 20000 | Loss --> 1.986 | Grad_l2 --> 0.253 | Weights_l2 --> 11079.072 | Lr --> 0.008 | Seconds_per_step --> 4.932 | [2024-08-31 00:35:01,169][Main][INFO] - [train] Step 3325 out of 20000 | Loss --> 1.999 | Grad_l2 --> 0.260 | Weights_l2 --> 11079.525 | Lr --> 0.008 | Seconds_per_step --> 4.857 | [2024-08-31 00:37:02,774][Main][INFO] - [train] Step 3350 out of 20000 | Loss --> 1.987 | Grad_l2 --> 0.255 | Weights_l2 --> 11079.975 | Lr --> 0.008 | Seconds_per_step --> 4.864 | [2024-08-31 00:39:05,731][Main][INFO] - [train] Step 3375 out of 20000 | Loss --> 2.003 | Grad_l2 --> 0.259 | Weights_l2 --> 11080.434 | Lr --> 0.008 | Seconds_per_step --> 4.918 | [2024-08-31 00:41:07,178][Main][INFO] - [train] Step 3400 out of 20000 | Loss --> 1.998 | Grad_l2 --> 0.254 | Weights_l2 --> 11080.887 | Lr --> 0.008 | Seconds_per_step --> 4.858 | [2024-08-31 00:43:08,627][Main][INFO] - [train] Step 3425 out of 20000 | Loss --> 1.995 | Grad_l2 --> 0.255 | Weights_l2 --> 11081.361 | Lr --> 0.008 | Seconds_per_step --> 4.858 | [2024-08-31 00:45:10,052][Main][INFO] - [train] Step 3450 out of 20000 | Loss --> 1.990 | Grad_l2 --> 0.263 | Weights_l2 --> 11081.818 | Lr --> 0.008 | Seconds_per_step --> 4.857 | [2024-08-31 00:47:13,035][Main][INFO] - [train] Step 3475 out of 20000 | Loss --> 2.002 | Grad_l2 --> 0.258 | Weights_l2 --> 11082.298 | Lr --> 0.008 | Seconds_per_step --> 4.919 | [2024-08-31 00:49:14,505][Main][INFO] - [train] Step 3500 out of 20000 | Loss --> 1.988 | Grad_l2 --> 0.252 | Weights_l2 --> 11082.768 | Lr --> 0.008 | Seconds_per_step --> 4.859 | [2024-08-31 00:51:15,990][Main][INFO] - [train] Step 3525 out of 20000 | Loss --> 1.986 | Grad_l2 --> 0.264 | Weights_l2 --> 11083.233 | Lr --> 0.009 | Seconds_per_step --> 4.859 | [2024-08-31 00:53:19,215][Main][INFO] - [train] Step 3550 out of 20000 | Loss --> 1.999 | Grad_l2 --> 0.249 | Weights_l2 --> 11083.728 | Lr --> 0.009 | Seconds_per_step --> 4.929 | [2024-08-31 00:55:20,726][Main][INFO] - [train] Step 3575 out of 20000 | Loss --> 1.988 | Grad_l2 --> 0.258 | Weights_l2 --> 11084.204 | Lr --> 0.009 | Seconds_per_step --> 4.860 | [2024-08-31 00:57:22,302][Main][INFO] - [train] Step 3600 out of 20000 | Loss --> 2.011 | Grad_l2 --> 0.258 | Weights_l2 --> 11084.691 | Lr --> 0.009 | Seconds_per_step --> 4.863 | [2024-08-31 00:59:25,171][Main][INFO] - [train] Step 3625 out of 20000 | Loss --> 1.994 | Grad_l2 --> 0.254 | Weights_l2 --> 11085.195 | Lr --> 0.009 | Seconds_per_step --> 4.915 | [2024-08-31 01:01:26,621][Main][INFO] - [train] Step 3650 out of 20000 | Loss --> 1.994 | Grad_l2 --> 0.264 | Weights_l2 --> 11085.705 | Lr --> 0.009 | Seconds_per_step --> 4.858 | [2024-08-31 01:03:28,128][Main][INFO] - [train] Step 3675 out of 20000 | Loss --> 1.988 | Grad_l2 --> 0.259 | Weights_l2 --> 11086.209 | Lr --> 0.009 | Seconds_per_step --> 4.860 | [2024-08-31 01:05:31,349][Main][INFO] - [train] Step 3700 out of 20000 | Loss --> 1.987 | Grad_l2 --> 0.258 | Weights_l2 --> 11086.707 | Lr --> 0.009 | Seconds_per_step --> 4.929 | [2024-08-31 01:07:32,792][Main][INFO] - [train] Step 3725 out of 20000 | Loss --> 1.991 | Grad_l2 --> 0.257 | Weights_l2 --> 11087.203 | Lr --> 0.009 | Seconds_per_step --> 4.858 | [2024-08-31 01:09:34,322][Main][INFO] - [train] Step 3750 out of 20000 | Loss --> 2.003 | Grad_l2 --> 0.258 | Weights_l2 --> 11087.740 | Lr --> 0.009 | Seconds_per_step --> 4.861 | [2024-08-31 01:11:37,461][Main][INFO] - [train] Step 3775 out of 20000 | Loss --> 1.990 | Grad_l2 --> 0.254 | Weights_l2 --> 11088.254 | Lr --> 0.009 | Seconds_per_step --> 4.925 | [2024-08-31 01:13:38,849][Main][INFO] - [train] Step 3800 out of 20000 | Loss --> 1.999 | Grad_l2 --> 0.254 | Weights_l2 --> 11088.768 | Lr --> 0.009 | Seconds_per_step --> 4.855 | [2024-08-31 01:15:40,200][Main][INFO] - [train] Step 3825 out of 20000 | Loss --> 2.002 | Grad_l2 --> 0.266 | Weights_l2 --> 11089.319 | Lr --> 0.009 | Seconds_per_step --> 4.854 | [2024-08-31 01:17:43,611][Main][INFO] - [train] Step 3850 out of 20000 | Loss --> 1.999 | Grad_l2 --> 0.254 | Weights_l2 --> 11089.859 | Lr --> 0.009 | Seconds_per_step --> 4.936 | [2024-08-31 01:19:44,940][Main][INFO] - [train] Step 3875 out of 20000 | Loss --> 1.993 | Grad_l2 --> 0.254 | Weights_l2 --> 11090.401 | Lr --> 0.009 | Seconds_per_step --> 4.853 | [2024-08-31 01:21:46,567][Main][INFO] - [train] Step 3900 out of 20000 | Loss --> 1.995 | Grad_l2 --> 0.261 | Weights_l2 --> 11090.936 | Lr --> 0.009 | Seconds_per_step --> 4.865 | [2024-08-31 01:23:48,058][Main][INFO] - [train] Step 3925 out of 20000 | Loss --> 2.000 | Grad_l2 --> 0.256 | Weights_l2 --> 11091.490 | Lr --> 0.009 | Seconds_per_step --> 4.860 | [2024-08-31 01:25:50,992][Main][INFO] - [train] Step 3950 out of 20000 | Loss --> 1.973 | Grad_l2 --> 0.265 | Weights_l2 --> 11092.026 | Lr --> 0.009 | Seconds_per_step --> 4.917 | [2024-08-31 01:27:52,367][Main][INFO] - [train] Step 3975 out of 20000 | Loss --> 1.990 | Grad_l2 --> 0.255 | Weights_l2 --> 11092.571 | Lr --> 0.009 | Seconds_per_step --> 4.855 | [2024-08-31 01:29:53,976][Main][INFO] - [train] Step 4000 out of 20000 | Loss --> 2.002 | Grad_l2 --> 0.244 | Weights_l2 --> 11093.136 | Lr --> 0.009 | Seconds_per_step --> 4.864 | [2024-08-31 01:31:56,935][Main][INFO] - [train] Step 4025 out of 20000 | Loss --> 1.985 | Grad_l2 --> 0.256 | Weights_l2 --> 11093.707 | Lr --> 0.009 | Seconds_per_step --> 4.918 | [2024-08-31 01:33:58,294][Main][INFO] - [train] Step 4050 out of 20000 | Loss --> 1.997 | Grad_l2 --> 0.250 | Weights_l2 --> 11094.273 | Lr --> 0.009 | Seconds_per_step --> 4.854 | [2024-08-31 01:35:59,919][Main][INFO] - [train] Step 4075 out of 20000 | Loss --> 1.997 | Grad_l2 --> 0.246 | Weights_l2 --> 11094.831 | Lr --> 0.009 | Seconds_per_step --> 4.865 | [2024-08-31 01:38:02,988][Main][INFO] - [train] Step 4100 out of 20000 | Loss --> 1.998 | Grad_l2 --> 0.251 | Weights_l2 --> 11095.407 | Lr --> 0.009 | Seconds_per_step --> 4.923 | [2024-08-31 01:40:04,344][Main][INFO] - [train] Step 4125 out of 20000 | Loss --> 2.000 | Grad_l2 --> 0.254 | Weights_l2 --> 11095.975 | Lr --> 0.009 | Seconds_per_step --> 4.854 | [2024-08-31 01:42:05,633][Main][INFO] - [train] Step 4150 out of 20000 | Loss --> 1.997 | Grad_l2 --> 0.257 | Weights_l2 --> 11096.566 | Lr --> 0.009 | Seconds_per_step --> 4.851 | [2024-08-31 01:44:08,448][Main][INFO] - [train] Step 4175 out of 20000 | Loss --> 2.003 | Grad_l2 --> 0.249 | Weights_l2 --> 11097.158 | Lr --> 0.009 | Seconds_per_step --> 4.913 | [2024-08-31 01:46:09,650][Main][INFO] - [train] Step 4200 out of 20000 | Loss --> 1.994 | Grad_l2 --> 0.248 | Weights_l2 --> 11097.738 | Lr --> 0.009 | Seconds_per_step --> 4.848 | [2024-08-31 01:48:10,987][Main][INFO] - [train] Step 4225 out of 20000 | Loss --> 1.994 | Grad_l2 --> 0.246 | Weights_l2 --> 11098.336 | Lr --> 0.009 | Seconds_per_step --> 4.853 | [2024-08-31 01:50:14,039][Main][INFO] - [train] Step 4250 out of 20000 | Loss --> 2.003 | Grad_l2 --> 0.318 | Weights_l2 --> 11098.973 | Lr --> 0.009 | Seconds_per_step --> 4.922 | [2024-08-31 01:52:15,498][Main][INFO] - [train] Step 4275 out of 20000 | Loss --> 1.996 | Grad_l2 --> 0.256 | Weights_l2 --> 11099.571 | Lr --> 0.009 | Seconds_per_step --> 4.858 | [2024-08-31 01:54:16,955][Main][INFO] - [train] Step 4300 out of 20000 | Loss --> 2.036 | Grad_l2 --> 0.532 | Weights_l2 --> 11100.264 | Lr --> 0.009 | Seconds_per_step --> 4.858 | [2024-08-31 01:56:18,360][Main][INFO] - [train] Step 4325 out of 20000 | Loss --> 2.044 | Grad_l2 --> 0.366 | Weights_l2 --> 11101.149 | Lr --> 0.009 | Seconds_per_step --> 4.856 | [2024-08-31 01:58:21,269][Main][INFO] - [train] Step 4350 out of 20000 | Loss --> 2.015 | Grad_l2 --> 0.264 | Weights_l2 --> 11101.864 | Lr --> 0.009 | Seconds_per_step --> 4.916 | [2024-08-31 02:00:22,396][Main][INFO] - [train] Step 4375 out of 20000 | Loss --> 2.004 | Grad_l2 --> 0.252 | Weights_l2 --> 11102.529 | Lr --> 0.009 | Seconds_per_step --> 4.845 | [2024-08-31 02:02:23,794][Main][INFO] - [train] Step 4400 out of 20000 | Loss --> 2.015 | Grad_l2 --> 0.249 | Weights_l2 --> 11103.204 | Lr --> 0.009 | Seconds_per_step --> 4.856 | [2024-08-31 02:04:26,895][Main][INFO] - [train] Step 4425 out of 20000 | Loss --> 2.011 | Grad_l2 --> 0.251 | Weights_l2 --> 11103.848 | Lr --> 0.009 | Seconds_per_step --> 4.924 | [2024-08-31 02:06:28,400][Main][INFO] - [train] Step 4450 out of 20000 | Loss --> 2.007 | Grad_l2 --> 0.254 | Weights_l2 --> 11104.476 | Lr --> 0.009 | Seconds_per_step --> 4.860 | [2024-08-31 02:08:29,830][Main][INFO] - [train] Step 4475 out of 20000 | Loss --> 1.997 | Grad_l2 --> 0.248 | Weights_l2 --> 11105.117 | Lr --> 0.009 | Seconds_per_step --> 4.857 | [2024-08-31 02:10:32,840][Main][INFO] - [train] Step 4500 out of 20000 | Loss --> 2.004 | Grad_l2 --> 0.250 | Weights_l2 --> 11105.788 | Lr --> 0.009 | Seconds_per_step --> 4.920 | [2024-08-31 02:12:34,341][Main][INFO] - [train] Step 4525 out of 20000 | Loss --> 2.005 | Grad_l2 --> 0.245 | Weights_l2 --> 11106.418 | Lr --> 0.010 | Seconds_per_step --> 4.860 | [2024-08-31 02:14:35,799][Main][INFO] - [train] Step 4550 out of 20000 | Loss --> 2.000 | Grad_l2 --> 0.253 | Weights_l2 --> 11107.085 | Lr --> 0.010 | Seconds_per_step --> 4.858 | [2024-08-31 02:16:38,920][Main][INFO] - [train] Step 4575 out of 20000 | Loss --> 1.998 | Grad_l2 --> 0.247 | Weights_l2 --> 11107.732 | Lr --> 0.010 | Seconds_per_step --> 4.925 | [2024-08-31 02:18:40,501][Main][INFO] - [train] Step 4600 out of 20000 | Loss --> 2.004 | Grad_l2 --> 0.249 | Weights_l2 --> 11108.387 | Lr --> 0.010 | Seconds_per_step --> 4.863 | [2024-08-31 02:20:42,148][Main][INFO] - [train] Step 4625 out of 20000 | Loss --> 1.988 | Grad_l2 --> 0.250 | Weights_l2 --> 11109.032 | Lr --> 0.010 | Seconds_per_step --> 4.866 | [2024-08-31 02:22:44,968][Main][INFO] - [train] Step 4650 out of 20000 | Loss --> 2.001 | Grad_l2 --> 0.244 | Weights_l2 --> 11109.699 | Lr --> 0.010 | Seconds_per_step --> 4.913 | [2024-08-31 02:24:46,423][Main][INFO] - [train] Step 4675 out of 20000 | Loss --> 2.006 | Grad_l2 --> 0.247 | Weights_l2 --> 11110.373 | Lr --> 0.010 | Seconds_per_step --> 4.858 | [2024-08-31 02:26:47,813][Main][INFO] - [train] Step 4700 out of 20000 | Loss --> 1.999 | Grad_l2 --> 0.246 | Weights_l2 --> 11111.053 | Lr --> 0.010 | Seconds_per_step --> 4.856 | [2024-08-31 02:28:50,977][Main][INFO] - [train] Step 4725 out of 20000 | Loss --> 2.010 | Grad_l2 --> 0.254 | Weights_l2 --> 11111.732 | Lr --> 0.010 | Seconds_per_step --> 4.926 | [2024-08-31 02:30:52,336][Main][INFO] - [train] Step 4750 out of 20000 | Loss --> 2.008 | Grad_l2 --> 0.248 | Weights_l2 --> 11112.423 | Lr --> 0.010 | Seconds_per_step --> 4.854 | [2024-08-31 02:32:53,779][Main][INFO] - [train] Step 4775 out of 20000 | Loss --> 2.003 | Grad_l2 --> 0.244 | Weights_l2 --> 11113.116 | Lr --> 0.010 | Seconds_per_step --> 4.858 | [2024-08-31 02:34:55,196][Main][INFO] - [train] Step 4800 out of 20000 | Loss --> 2.005 | Grad_l2 --> 0.254 | Weights_l2 --> 11113.816 | Lr --> 0.010 | Seconds_per_step --> 4.857 | [2024-08-31 02:36:58,207][Main][INFO] - [train] Step 4825 out of 20000 | Loss --> 2.008 | Grad_l2 --> 0.248 | Weights_l2 --> 11114.519 | Lr --> 0.010 | Seconds_per_step --> 4.920 | [2024-08-31 02:38:59,687][Main][INFO] - [train] Step 4850 out of 20000 | Loss --> 2.009 | Grad_l2 --> 0.242 | Weights_l2 --> 11115.234 | Lr --> 0.010 | Seconds_per_step --> 4.859 | [2024-08-31 02:41:01,077][Main][INFO] - [train] Step 4875 out of 20000 | Loss --> 2.018 | Grad_l2 --> 0.241 | Weights_l2 --> 11115.945 | Lr --> 0.010 | Seconds_per_step --> 4.856 | [2024-08-31 02:43:04,045][Main][INFO] - [train] Step 4900 out of 20000 | Loss --> 2.019 | Grad_l2 --> 0.252 | Weights_l2 --> 11116.665 | Lr --> 0.010 | Seconds_per_step --> 4.919 | [2024-08-31 02:45:05,336][Main][INFO] - [train] Step 4925 out of 20000 | Loss --> 2.021 | Grad_l2 --> 0.247 | Weights_l2 --> 11117.383 | Lr --> 0.010 | Seconds_per_step --> 4.852 | [2024-08-31 02:47:06,741][Main][INFO] - [train] Step 4950 out of 20000 | Loss --> 2.009 | Grad_l2 --> 0.269 | Weights_l2 --> 11118.089 | Lr --> 0.010 | Seconds_per_step --> 4.856 | [2024-08-31 02:49:09,888][Main][INFO] - [train] Step 4975 out of 20000 | Loss --> 2.019 | Grad_l2 --> 0.254 | Weights_l2 --> 11118.832 | Lr --> 0.010 | Seconds_per_step --> 4.926 | [2024-08-31 02:51:11,212][Main][INFO] - [train] Step 5000 out of 20000 | Loss --> 2.007 | Grad_l2 --> 0.246 | Weights_l2 --> 11119.573 | Lr --> 0.010 | Seconds_per_step --> 4.853 | [2024-08-31 02:51:11,213][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-5000 [2024-08-31 02:51:11,220][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-31 02:51:17,854][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-5000/model.safetensors [2024-08-31 02:51:26,221][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-5000/optimizer.bin [2024-08-31 02:51:26,225][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-5000/scheduler.bin [2024-08-31 02:51:26,226][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-5000/sampler.bin [2024-08-31 02:51:26,226][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-5000/sampler_1.bin [2024-08-31 02:51:26,227][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-5000/random_states_0.pkl [2024-08-31 02:53:27,362][Main][INFO] - [train] Step 5025 out of 20000 | Loss --> 2.012 | Grad_l2 --> 0.243 | Weights_l2 --> 11120.327 | Lr --> 0.010 | Seconds_per_step --> 5.446 | [2024-08-31 02:55:30,552][Main][INFO] - [train] Step 5050 out of 20000 | Loss --> 2.008 | Grad_l2 --> 0.247 | Weights_l2 --> 11121.066 | Lr --> 0.010 | Seconds_per_step --> 4.928 | [2024-08-31 02:57:32,023][Main][INFO] - [train] Step 5075 out of 20000 | Loss --> 2.019 | Grad_l2 --> 0.242 | Weights_l2 --> 11121.812 | Lr --> 0.010 | Seconds_per_step --> 4.859 | [2024-08-31 02:59:33,592][Main][INFO] - [train] Step 5100 out of 20000 | Loss --> 2.020 | Grad_l2 --> 0.238 | Weights_l2 --> 11122.546 | Lr --> 0.010 | Seconds_per_step --> 4.863 | [2024-08-31 03:01:36,526][Main][INFO] - [train] Step 5125 out of 20000 | Loss --> 2.017 | Grad_l2 --> 0.242 | Weights_l2 --> 11123.276 | Lr --> 0.010 | Seconds_per_step --> 4.917 | [2024-08-31 03:03:37,883][Main][INFO] - [train] Step 5150 out of 20000 | Loss --> 2.012 | Grad_l2 --> 0.238 | Weights_l2 --> 11124.032 | Lr --> 0.010 | Seconds_per_step --> 4.854 | [2024-08-31 03:05:39,333][Main][INFO] - [train] Step 5175 out of 20000 | Loss --> 2.013 | Grad_l2 --> 0.245 | Weights_l2 --> 11124.764 | Lr --> 0.010 | Seconds_per_step --> 4.858 | [2024-08-31 03:07:40,837][Main][INFO] - [train] Step 5200 out of 20000 | Loss --> 2.006 | Grad_l2 --> 0.242 | Weights_l2 --> 11125.489 | Lr --> 0.010 | Seconds_per_step --> 4.860 | [2024-08-31 03:09:43,880][Main][INFO] - [train] Step 5225 out of 20000 | Loss --> 2.005 | Grad_l2 --> 0.245 | Weights_l2 --> 11126.212 | Lr --> 0.010 | Seconds_per_step --> 4.922 | [2024-08-31 03:11:45,390][Main][INFO] - [train] Step 5250 out of 20000 | Loss --> 2.029 | Grad_l2 --> 0.241 | Weights_l2 --> 11126.974 | Lr --> 0.010 | Seconds_per_step --> 4.860 | [2024-08-31 03:13:46,974][Main][INFO] - [train] Step 5275 out of 20000 | Loss --> 2.027 | Grad_l2 --> 0.241 | Weights_l2 --> 11127.722 | Lr --> 0.010 | Seconds_per_step --> 4.863 | [2024-08-31 03:15:49,986][Main][INFO] - [train] Step 5300 out of 20000 | Loss --> 2.007 | Grad_l2 --> 0.247 | Weights_l2 --> 11128.486 | Lr --> 0.010 | Seconds_per_step --> 4.920 | [2024-08-31 03:17:51,543][Main][INFO] - [train] Step 5325 out of 20000 | Loss --> 2.021 | Grad_l2 --> 0.242 | Weights_l2 --> 11129.234 | Lr --> 0.010 | Seconds_per_step --> 4.862 | [2024-08-31 03:19:53,074][Main][INFO] - [train] Step 5350 out of 20000 | Loss --> 2.011 | Grad_l2 --> 0.237 | Weights_l2 --> 11129.982 | Lr --> 0.010 | Seconds_per_step --> 4.861 | [2024-08-31 03:21:56,353][Main][INFO] - [train] Step 5375 out of 20000 | Loss --> 2.020 | Grad_l2 --> 0.243 | Weights_l2 --> 11130.720 | Lr --> 0.010 | Seconds_per_step --> 4.931 | [2024-08-31 03:23:57,953][Main][INFO] - [train] Step 5400 out of 20000 | Loss --> 2.008 | Grad_l2 --> 0.238 | Weights_l2 --> 11131.434 | Lr --> 0.010 | Seconds_per_step --> 4.864 | [2024-08-31 03:25:59,600][Main][INFO] - [train] Step 5425 out of 20000 | Loss --> 2.013 | Grad_l2 --> 0.238 | Weights_l2 --> 11132.177 | Lr --> 0.010 | Seconds_per_step --> 4.866 | [2024-08-31 03:28:02,811][Main][INFO] - [train] Step 5450 out of 20000 | Loss --> 2.018 | Grad_l2 --> 0.242 | Weights_l2 --> 11132.914 | Lr --> 0.010 | Seconds_per_step --> 4.928 | [2024-08-31 03:30:04,540][Main][INFO] - [train] Step 5475 out of 20000 | Loss --> 2.012 | Grad_l2 --> 0.241 | Weights_l2 --> 11133.642 | Lr --> 0.010 | Seconds_per_step --> 4.869 | [2024-08-31 03:32:06,189][Main][INFO] - [train] Step 5500 out of 20000 | Loss --> 2.021 | Grad_l2 --> 0.244 | Weights_l2 --> 11134.392 | Lr --> 0.010 | Seconds_per_step --> 4.866 | [2024-08-31 03:34:09,265][Main][INFO] - [train] Step 5525 out of 20000 | Loss --> 2.011 | Grad_l2 --> 0.233 | Weights_l2 --> 11135.139 | Lr --> 0.010 | Seconds_per_step --> 4.923 | [2024-08-31 03:36:10,852][Main][INFO] - [train] Step 5550 out of 20000 | Loss --> 2.022 | Grad_l2 --> 0.238 | Weights_l2 --> 11135.889 | Lr --> 0.010 | Seconds_per_step --> 4.863 | [2024-08-31 03:38:12,355][Main][INFO] - [train] Step 5575 out of 20000 | Loss --> 2.008 | Grad_l2 --> 0.236 | Weights_l2 --> 11136.605 | Lr --> 0.010 | Seconds_per_step --> 4.860 | [2024-08-31 03:40:15,464][Main][INFO] - [train] Step 5600 out of 20000 | Loss --> 2.014 | Grad_l2 --> 0.240 | Weights_l2 --> 11137.342 | Lr --> 0.010 | Seconds_per_step --> 4.924 | [2024-08-31 03:42:17,062][Main][INFO] - [train] Step 5625 out of 20000 | Loss --> 2.018 | Grad_l2 --> 0.237 | Weights_l2 --> 11138.081 | Lr --> 0.010 | Seconds_per_step --> 4.864 | [2024-08-31 03:44:18,807][Main][INFO] - [train] Step 5650 out of 20000 | Loss --> 2.019 | Grad_l2 --> 0.248 | Weights_l2 --> 11138.844 | Lr --> 0.010 | Seconds_per_step --> 4.870 | [2024-08-31 03:46:20,519][Main][INFO] - [train] Step 5675 out of 20000 | Loss --> 2.015 | Grad_l2 --> 0.237 | Weights_l2 --> 11139.548 | Lr --> 0.010 | Seconds_per_step --> 4.868 | [2024-08-31 03:48:23,902][Main][INFO] - [train] Step 5700 out of 20000 | Loss --> 2.009 | Grad_l2 --> 0.233 | Weights_l2 --> 11140.264 | Lr --> 0.010 | Seconds_per_step --> 4.935 | [2024-08-31 03:50:25,727][Main][INFO] - [train] Step 5725 out of 20000 | Loss --> 2.025 | Grad_l2 --> 0.237 | Weights_l2 --> 11140.993 | Lr --> 0.010 | Seconds_per_step --> 4.873 | [2024-08-31 03:52:27,385][Main][INFO] - [train] Step 5750 out of 20000 | Loss --> 2.019 | Grad_l2 --> 0.236 | Weights_l2 --> 11141.751 | Lr --> 0.010 | Seconds_per_step --> 4.866 | [2024-08-31 03:54:30,624][Main][INFO] - [train] Step 5775 out of 20000 | Loss --> 2.026 | Grad_l2 --> 0.238 | Weights_l2 --> 11142.483 | Lr --> 0.010 | Seconds_per_step --> 4.929 | [2024-08-31 03:56:32,278][Main][INFO] - [train] Step 5800 out of 20000 | Loss --> 2.014 | Grad_l2 --> 0.236 | Weights_l2 --> 11143.216 | Lr --> 0.010 | Seconds_per_step --> 4.866 | [2024-08-31 03:58:33,934][Main][INFO] - [train] Step 5825 out of 20000 | Loss --> 2.007 | Grad_l2 --> 0.233 | Weights_l2 --> 11143.929 | Lr --> 0.010 | Seconds_per_step --> 4.866 | [2024-08-31 04:00:37,235][Main][INFO] - [train] Step 5850 out of 20000 | Loss --> 2.024 | Grad_l2 --> 0.237 | Weights_l2 --> 11144.689 | Lr --> 0.010 | Seconds_per_step --> 4.932 | [2024-08-31 04:02:39,003][Main][INFO] - [train] Step 5875 out of 20000 | Loss --> 2.012 | Grad_l2 --> 0.241 | Weights_l2 --> 11145.441 | Lr --> 0.010 | Seconds_per_step --> 4.871 | [2024-08-31 04:04:40,715][Main][INFO] - [train] Step 5900 out of 20000 | Loss --> 2.026 | Grad_l2 --> 0.234 | Weights_l2 --> 11146.163 | Lr --> 0.010 | Seconds_per_step --> 4.868 | [2024-08-31 04:06:44,138][Main][INFO] - [train] Step 5925 out of 20000 | Loss --> 2.031 | Grad_l2 --> 0.237 | Weights_l2 --> 11146.904 | Lr --> 0.010 | Seconds_per_step --> 4.937 | [2024-08-31 04:08:45,733][Main][INFO] - [train] Step 5950 out of 20000 | Loss --> 2.010 | Grad_l2 --> 0.238 | Weights_l2 --> 11147.634 | Lr --> 0.010 | Seconds_per_step --> 4.864 | [2024-08-31 04:10:47,257][Main][INFO] - [train] Step 5975 out of 20000 | Loss --> 2.016 | Grad_l2 --> 0.236 | Weights_l2 --> 11148.369 | Lr --> 0.010 | Seconds_per_step --> 4.861 | [2024-08-31 04:12:50,368][Main][INFO] - [train] Step 6000 out of 20000 | Loss --> 2.015 | Grad_l2 --> 0.232 | Weights_l2 --> 11149.084 | Lr --> 0.010 | Seconds_per_step --> 4.924 | [2024-08-31 04:14:52,069][Main][INFO] - [train] Step 6025 out of 20000 | Loss --> 2.028 | Grad_l2 --> 0.234 | Weights_l2 --> 11149.818 | Lr --> 0.010 | Seconds_per_step --> 4.868 | [2024-08-31 04:16:53,956][Main][INFO] - [train] Step 6050 out of 20000 | Loss --> 2.014 | Grad_l2 --> 0.232 | Weights_l2 --> 11150.537 | Lr --> 0.010 | Seconds_per_step --> 4.875 | [2024-08-31 04:18:55,605][Main][INFO] - [train] Step 6075 out of 20000 | Loss --> 2.016 | Grad_l2 --> 0.236 | Weights_l2 --> 11151.255 | Lr --> 0.010 | Seconds_per_step --> 4.866 | [2024-08-31 04:20:58,945][Main][INFO] - [train] Step 6100 out of 20000 | Loss --> 2.010 | Grad_l2 --> 0.231 | Weights_l2 --> 11152.001 | Lr --> 0.010 | Seconds_per_step --> 4.933 | [2024-08-31 04:23:00,761][Main][INFO] - [train] Step 6125 out of 20000 | Loss --> 2.018 | Grad_l2 --> 0.237 | Weights_l2 --> 11152.727 | Lr --> 0.010 | Seconds_per_step --> 4.873 | [2024-08-31 04:25:02,679][Main][INFO] - [train] Step 6150 out of 20000 | Loss --> 2.026 | Grad_l2 --> 0.234 | Weights_l2 --> 11153.455 | Lr --> 0.010 | Seconds_per_step --> 4.877 | [2024-08-31 04:27:05,980][Main][INFO] - [train] Step 6175 out of 20000 | Loss --> 2.021 | Grad_l2 --> 0.231 | Weights_l2 --> 11154.181 | Lr --> 0.010 | Seconds_per_step --> 4.932 | [2024-08-31 04:29:07,751][Main][INFO] - [train] Step 6200 out of 20000 | Loss --> 2.019 | Grad_l2 --> 0.231 | Weights_l2 --> 11154.897 | Lr --> 0.010 | Seconds_per_step --> 4.871 | [2024-08-31 04:31:09,676][Main][INFO] - [train] Step 6225 out of 20000 | Loss --> 2.009 | Grad_l2 --> 0.232 | Weights_l2 --> 11155.617 | Lr --> 0.010 | Seconds_per_step --> 4.877 | [2024-08-31 04:33:13,009][Main][INFO] - [train] Step 6250 out of 20000 | Loss --> 2.018 | Grad_l2 --> 0.233 | Weights_l2 --> 11156.325 | Lr --> 0.010 | Seconds_per_step --> 4.933 | [2024-08-31 04:35:14,904][Main][INFO] - [train] Step 6275 out of 20000 | Loss --> 2.014 | Grad_l2 --> 0.228 | Weights_l2 --> 11157.060 | Lr --> 0.010 | Seconds_per_step --> 4.876 | [2024-08-31 04:37:16,631][Main][INFO] - [train] Step 6300 out of 20000 | Loss --> 2.023 | Grad_l2 --> 0.233 | Weights_l2 --> 11157.794 | Lr --> 0.010 | Seconds_per_step --> 4.869 | [2024-08-31 04:39:19,858][Main][INFO] - [train] Step 6325 out of 20000 | Loss --> 2.007 | Grad_l2 --> 0.233 | Weights_l2 --> 11158.501 | Lr --> 0.010 | Seconds_per_step --> 4.929 | [2024-08-31 04:41:21,687][Main][INFO] - [train] Step 6350 out of 20000 | Loss --> 2.016 | Grad_l2 --> 0.235 | Weights_l2 --> 11159.231 | Lr --> 0.010 | Seconds_per_step --> 4.873 | [2024-08-31 04:43:23,441][Main][INFO] - [train] Step 6375 out of 20000 | Loss --> 2.012 | Grad_l2 --> 0.228 | Weights_l2 --> 11159.925 | Lr --> 0.010 | Seconds_per_step --> 4.870 | [2024-08-31 04:45:26,835][Main][INFO] - [train] Step 6400 out of 20000 | Loss --> 2.013 | Grad_l2 --> 0.227 | Weights_l2 --> 11160.632 | Lr --> 0.010 | Seconds_per_step --> 4.936 | [2024-08-31 04:47:28,522][Main][INFO] - [train] Step 6425 out of 20000 | Loss --> 2.017 | Grad_l2 --> 0.228 | Weights_l2 --> 11161.336 | Lr --> 0.010 | Seconds_per_step --> 4.867 | [2024-08-31 04:49:30,159][Main][INFO] - [train] Step 6450 out of 20000 | Loss --> 1.999 | Grad_l2 --> 0.232 | Weights_l2 --> 11162.033 | Lr --> 0.010 | Seconds_per_step --> 4.865 | [2024-08-31 04:51:33,596][Main][INFO] - [train] Step 6475 out of 20000 | Loss --> 2.010 | Grad_l2 --> 0.227 | Weights_l2 --> 11162.738 | Lr --> 0.010 | Seconds_per_step --> 4.937 | [2024-08-31 04:53:35,348][Main][INFO] - [train] Step 6500 out of 20000 | Loss --> 2.010 | Grad_l2 --> 0.240 | Weights_l2 --> 11163.427 | Lr --> 0.010 | Seconds_per_step --> 4.870 | [2024-08-31 04:55:37,253][Main][INFO] - [train] Step 6525 out of 20000 | Loss --> 2.013 | Grad_l2 --> 0.228 | Weights_l2 --> 11164.142 | Lr --> 0.010 | Seconds_per_step --> 4.876 | [2024-08-31 04:57:39,041][Main][INFO] - [train] Step 6550 out of 20000 | Loss --> 2.003 | Grad_l2 --> 0.230 | Weights_l2 --> 11164.855 | Lr --> 0.010 | Seconds_per_step --> 4.871 | [2024-08-31 04:59:42,290][Main][INFO] - [train] Step 6575 out of 20000 | Loss --> 2.009 | Grad_l2 --> 0.230 | Weights_l2 --> 11165.581 | Lr --> 0.010 | Seconds_per_step --> 4.930 | [2024-08-31 05:01:44,028][Main][INFO] - [train] Step 6600 out of 20000 | Loss --> 2.016 | Grad_l2 --> 0.231 | Weights_l2 --> 11166.269 | Lr --> 0.010 | Seconds_per_step --> 4.869 | [2024-08-31 05:03:45,782][Main][INFO] - [train] Step 6625 out of 20000 | Loss --> 2.008 | Grad_l2 --> 0.232 | Weights_l2 --> 11166.991 | Lr --> 0.010 | Seconds_per_step --> 4.870 | [2024-08-31 05:05:49,030][Main][INFO] - [train] Step 6650 out of 20000 | Loss --> 2.017 | Grad_l2 --> 0.231 | Weights_l2 --> 11167.706 | Lr --> 0.010 | Seconds_per_step --> 4.930 | [2024-08-31 05:07:50,781][Main][INFO] - [train] Step 6675 out of 20000 | Loss --> 2.000 | Grad_l2 --> 0.231 | Weights_l2 --> 11168.410 | Lr --> 0.010 | Seconds_per_step --> 4.870 | [2024-08-31 05:09:52,662][Main][INFO] - [train] Step 6700 out of 20000 | Loss --> 2.002 | Grad_l2 --> 0.234 | Weights_l2 --> 11169.116 | Lr --> 0.010 | Seconds_per_step --> 4.875 | [2024-08-31 05:11:55,788][Main][INFO] - [train] Step 6725 out of 20000 | Loss --> 2.006 | Grad_l2 --> 0.230 | Weights_l2 --> 11169.797 | Lr --> 0.010 | Seconds_per_step --> 4.925 | [2024-08-31 05:13:57,342][Main][INFO] - [train] Step 6750 out of 20000 | Loss --> 1.997 | Grad_l2 --> 0.228 | Weights_l2 --> 11170.484 | Lr --> 0.010 | Seconds_per_step --> 4.862 | [2024-08-31 05:15:59,015][Main][INFO] - [train] Step 6775 out of 20000 | Loss --> 2.003 | Grad_l2 --> 0.232 | Weights_l2 --> 11171.169 | Lr --> 0.010 | Seconds_per_step --> 4.867 | [2024-08-31 05:18:02,378][Main][INFO] - [train] Step 6800 out of 20000 | Loss --> 2.007 | Grad_l2 --> 0.228 | Weights_l2 --> 11171.849 | Lr --> 0.010 | Seconds_per_step --> 4.934 | [2024-08-31 05:20:04,556][Main][INFO] - [train] Step 6825 out of 20000 | Loss --> 2.009 | Grad_l2 --> 0.227 | Weights_l2 --> 11172.540 | Lr --> 0.010 | Seconds_per_step --> 4.887 | [2024-08-31 05:22:06,769][Main][INFO] - [train] Step 6850 out of 20000 | Loss --> 1.999 | Grad_l2 --> 0.227 | Weights_l2 --> 11173.248 | Lr --> 0.010 | Seconds_per_step --> 4.888 | [2024-08-31 05:24:10,797][Main][INFO] - [train] Step 6875 out of 20000 | Loss --> 1.998 | Grad_l2 --> 0.225 | Weights_l2 --> 11173.896 | Lr --> 0.010 | Seconds_per_step --> 4.961 | [2024-08-31 05:26:12,755][Main][INFO] - [train] Step 6900 out of 20000 | Loss --> 2.009 | Grad_l2 --> 0.227 | Weights_l2 --> 11174.600 | Lr --> 0.010 | Seconds_per_step --> 4.878 | [2024-08-31 05:28:14,420][Main][INFO] - [train] Step 6925 out of 20000 | Loss --> 2.004 | Grad_l2 --> 0.227 | Weights_l2 --> 11175.299 | Lr --> 0.010 | Seconds_per_step --> 4.866 | [2024-08-31 05:30:16,265][Main][INFO] - [train] Step 6950 out of 20000 | Loss --> 2.000 | Grad_l2 --> 0.225 | Weights_l2 --> 11175.990 | Lr --> 0.010 | Seconds_per_step --> 4.874 | [2024-08-31 05:32:19,574][Main][INFO] - [train] Step 6975 out of 20000 | Loss --> 2.004 | Grad_l2 --> 0.226 | Weights_l2 --> 11176.680 | Lr --> 0.010 | Seconds_per_step --> 4.932 | [2024-08-31 05:34:21,084][Main][INFO] - [train] Step 7000 out of 20000 | Loss --> 1.990 | Grad_l2 --> 0.223 | Weights_l2 --> 11177.361 | Lr --> 0.010 | Seconds_per_step --> 4.860 | [2024-08-31 05:36:22,735][Main][INFO] - [train] Step 7025 out of 20000 | Loss --> 2.006 | Grad_l2 --> 0.221 | Weights_l2 --> 11178.031 | Lr --> 0.010 | Seconds_per_step --> 4.866 | [2024-08-31 05:38:26,261][Main][INFO] - [train] Step 7050 out of 20000 | Loss --> 2.007 | Grad_l2 --> 0.225 | Weights_l2 --> 11178.693 | Lr --> 0.010 | Seconds_per_step --> 4.941 | [2024-08-31 05:40:27,644][Main][INFO] - [train] Step 7075 out of 20000 | Loss --> 1.999 | Grad_l2 --> 0.224 | Weights_l2 --> 11179.383 | Lr --> 0.010 | Seconds_per_step --> 4.855 | [2024-08-31 05:42:29,048][Main][INFO] - [train] Step 7100 out of 20000 | Loss --> 2.003 | Grad_l2 --> 0.226 | Weights_l2 --> 11180.056 | Lr --> 0.010 | Seconds_per_step --> 4.856 | [2024-08-31 05:44:31,950][Main][INFO] - [train] Step 7125 out of 20000 | Loss --> 1.999 | Grad_l2 --> 0.224 | Weights_l2 --> 11180.729 | Lr --> 0.010 | Seconds_per_step --> 4.916 | [2024-08-31 05:46:33,708][Main][INFO] - [train] Step 7150 out of 20000 | Loss --> 1.998 | Grad_l2 --> 0.220 | Weights_l2 --> 11181.393 | Lr --> 0.010 | Seconds_per_step --> 4.870 | [2024-08-31 05:48:35,396][Main][INFO] - [train] Step 7175 out of 20000 | Loss --> 1.995 | Grad_l2 --> 0.228 | Weights_l2 --> 11182.054 | Lr --> 0.009 | Seconds_per_step --> 4.867 | [2024-08-31 05:50:38,558][Main][INFO] - [train] Step 7200 out of 20000 | Loss --> 1.997 | Grad_l2 --> 0.229 | Weights_l2 --> 11182.740 | Lr --> 0.009 | Seconds_per_step --> 4.926 | [2024-08-31 05:52:40,277][Main][INFO] - [train] Step 7225 out of 20000 | Loss --> 2.000 | Grad_l2 --> 0.227 | Weights_l2 --> 11183.395 | Lr --> 0.009 | Seconds_per_step --> 4.869 | [2024-08-31 05:54:41,932][Main][INFO] - [train] Step 7250 out of 20000 | Loss --> 1.994 | Grad_l2 --> 0.223 | Weights_l2 --> 11184.058 | Lr --> 0.009 | Seconds_per_step --> 4.866 | [2024-08-31 05:56:45,155][Main][INFO] - [train] Step 7275 out of 20000 | Loss --> 1.992 | Grad_l2 --> 0.226 | Weights_l2 --> 11184.698 | Lr --> 0.009 | Seconds_per_step --> 4.929 | [2024-08-31 05:58:46,798][Main][INFO] - [train] Step 7300 out of 20000 | Loss --> 1.991 | Grad_l2 --> 0.232 | Weights_l2 --> 11185.348 | Lr --> 0.009 | Seconds_per_step --> 4.866 | [2024-08-31 06:00:48,506][Main][INFO] - [train] Step 7325 out of 20000 | Loss --> 1.989 | Grad_l2 --> 0.222 | Weights_l2 --> 11186.006 | Lr --> 0.009 | Seconds_per_step --> 4.868 | [2024-08-31 06:02:52,056][Main][INFO] - [train] Step 7350 out of 20000 | Loss --> 1.992 | Grad_l2 --> 0.226 | Weights_l2 --> 11186.669 | Lr --> 0.009 | Seconds_per_step --> 4.942 | [2024-08-31 06:04:53,689][Main][INFO] - [train] Step 7375 out of 20000 | Loss --> 1.989 | Grad_l2 --> 0.223 | Weights_l2 --> 11187.321 | Lr --> 0.009 | Seconds_per_step --> 4.865 | [2024-08-31 06:06:55,306][Main][INFO] - [train] Step 7400 out of 20000 | Loss --> 1.990 | Grad_l2 --> 0.225 | Weights_l2 --> 11187.976 | Lr --> 0.009 | Seconds_per_step --> 4.865 | [2024-08-31 06:08:56,923][Main][INFO] - [train] Step 7425 out of 20000 | Loss --> 1.988 | Grad_l2 --> 0.226 | Weights_l2 --> 11188.622 | Lr --> 0.009 | Seconds_per_step --> 4.865 | [2024-08-31 06:11:00,109][Main][INFO] - [train] Step 7450 out of 20000 | Loss --> 2.004 | Grad_l2 --> 0.225 | Weights_l2 --> 11189.273 | Lr --> 0.009 | Seconds_per_step --> 4.927 | [2024-08-31 06:13:01,928][Main][INFO] - [train] Step 7475 out of 20000 | Loss --> 2.000 | Grad_l2 --> 0.224 | Weights_l2 --> 11189.893 | Lr --> 0.009 | Seconds_per_step --> 4.873 | [2024-08-31 06:15:03,410][Main][INFO] - [train] Step 7500 out of 20000 | Loss --> 1.997 | Grad_l2 --> 0.224 | Weights_l2 --> 11190.554 | Lr --> 0.009 | Seconds_per_step --> 4.859 | [2024-08-31 06:15:03,411][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-7500 [2024-08-31 06:15:03,418][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-31 06:15:11,108][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-7500/model.safetensors [2024-08-31 06:15:20,442][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-7500/optimizer.bin [2024-08-31 06:15:20,444][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-7500/scheduler.bin [2024-08-31 06:15:20,444][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-7500/sampler.bin [2024-08-31 06:15:20,444][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-7500/sampler_1.bin [2024-08-31 06:15:20,446][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-7500/random_states_0.pkl [2024-08-31 06:17:23,174][Main][INFO] - [train] Step 7525 out of 20000 | Loss --> 1.996 | Grad_l2 --> 0.229 | Weights_l2 --> 11191.205 | Lr --> 0.009 | Seconds_per_step --> 5.590 | [2024-08-31 06:19:24,645][Main][INFO] - [train] Step 7550 out of 20000 | Loss --> 1.998 | Grad_l2 --> 0.227 | Weights_l2 --> 11191.856 | Lr --> 0.009 | Seconds_per_step --> 4.859 | [2024-08-31 06:21:26,400][Main][INFO] - [train] Step 7575 out of 20000 | Loss --> 1.978 | Grad_l2 --> 0.221 | Weights_l2 --> 11192.472 | Lr --> 0.009 | Seconds_per_step --> 4.870 | [2024-08-31 06:23:29,410][Main][INFO] - [train] Step 7600 out of 20000 | Loss --> 1.990 | Grad_l2 --> 0.221 | Weights_l2 --> 11193.099 | Lr --> 0.009 | Seconds_per_step --> 4.920 | [2024-08-31 06:25:31,276][Main][INFO] - [train] Step 7625 out of 20000 | Loss --> 1.992 | Grad_l2 --> 0.216 | Weights_l2 --> 11193.740 | Lr --> 0.009 | Seconds_per_step --> 4.875 | [2024-08-31 06:27:32,808][Main][INFO] - [train] Step 7650 out of 20000 | Loss --> 1.991 | Grad_l2 --> 0.222 | Weights_l2 --> 11194.339 | Lr --> 0.009 | Seconds_per_step --> 4.861 | [2024-08-31 06:29:36,008][Main][INFO] - [train] Step 7675 out of 20000 | Loss --> 1.998 | Grad_l2 --> 0.223 | Weights_l2 --> 11194.963 | Lr --> 0.009 | Seconds_per_step --> 4.928 | [2024-08-31 06:31:37,749][Main][INFO] - [train] Step 7700 out of 20000 | Loss --> 1.986 | Grad_l2 --> 0.218 | Weights_l2 --> 11195.606 | Lr --> 0.009 | Seconds_per_step --> 4.870 | [2024-08-31 06:33:39,477][Main][INFO] - [train] Step 7725 out of 20000 | Loss --> 1.988 | Grad_l2 --> 0.221 | Weights_l2 --> 11196.225 | Lr --> 0.009 | Seconds_per_step --> 4.869 | [2024-08-31 06:35:42,742][Main][INFO] - [train] Step 7750 out of 20000 | Loss --> 1.984 | Grad_l2 --> 0.221 | Weights_l2 --> 11196.852 | Lr --> 0.009 | Seconds_per_step --> 4.930 | [2024-08-31 06:37:44,471][Main][INFO] - [train] Step 7775 out of 20000 | Loss --> 1.982 | Grad_l2 --> 0.218 | Weights_l2 --> 11197.473 | Lr --> 0.009 | Seconds_per_step --> 4.869 | [2024-08-31 06:39:46,678][Main][INFO] - [train] Step 7800 out of 20000 | Loss --> 1.981 | Grad_l2 --> 0.218 | Weights_l2 --> 11198.081 | Lr --> 0.009 | Seconds_per_step --> 4.888 | [2024-08-31 06:41:50,230][Main][INFO] - [train] Step 7825 out of 20000 | Loss --> 1.992 | Grad_l2 --> 0.221 | Weights_l2 --> 11198.689 | Lr --> 0.009 | Seconds_per_step --> 4.942 | [2024-08-31 06:43:51,733][Main][INFO] - [train] Step 7850 out of 20000 | Loss --> 1.991 | Grad_l2 --> 0.215 | Weights_l2 --> 11199.295 | Lr --> 0.009 | Seconds_per_step --> 4.860 | [2024-08-31 06:45:53,476][Main][INFO] - [train] Step 7875 out of 20000 | Loss --> 1.972 | Grad_l2 --> 0.222 | Weights_l2 --> 11199.887 | Lr --> 0.009 | Seconds_per_step --> 4.870 | [2024-08-31 06:47:55,013][Main][INFO] - [train] Step 7900 out of 20000 | Loss --> 1.978 | Grad_l2 --> 0.222 | Weights_l2 --> 11200.476 | Lr --> 0.009 | Seconds_per_step --> 4.861 | [2024-08-31 06:49:58,328][Main][INFO] - [train] Step 7925 out of 20000 | Loss --> 1.967 | Grad_l2 --> 0.218 | Weights_l2 --> 11201.089 | Lr --> 0.009 | Seconds_per_step --> 4.932 | [2024-08-31 06:52:00,278][Main][INFO] - [train] Step 7950 out of 20000 | Loss --> 1.976 | Grad_l2 --> 0.218 | Weights_l2 --> 11201.688 | Lr --> 0.009 | Seconds_per_step --> 4.878 | [2024-08-31 06:54:01,999][Main][INFO] - [train] Step 7975 out of 20000 | Loss --> 1.977 | Grad_l2 --> 0.215 | Weights_l2 --> 11202.277 | Lr --> 0.009 | Seconds_per_step --> 4.869 | [2024-08-31 06:56:05,299][Main][INFO] - [train] Step 8000 out of 20000 | Loss --> 1.984 | Grad_l2 --> 0.218 | Weights_l2 --> 11202.881 | Lr --> 0.009 | Seconds_per_step --> 4.932 | [2024-08-31 06:58:07,085][Main][INFO] - [train] Step 8025 out of 20000 | Loss --> 1.972 | Grad_l2 --> 0.222 | Weights_l2 --> 11203.468 | Lr --> 0.009 | Seconds_per_step --> 4.871 | [2024-08-31 07:00:08,837][Main][INFO] - [train] Step 8050 out of 20000 | Loss --> 1.972 | Grad_l2 --> 0.222 | Weights_l2 --> 11204.040 | Lr --> 0.009 | Seconds_per_step --> 4.870 | [2024-08-31 07:02:12,006][Main][INFO] - [train] Step 8075 out of 20000 | Loss --> 1.980 | Grad_l2 --> 0.220 | Weights_l2 --> 11204.656 | Lr --> 0.009 | Seconds_per_step --> 4.927 | [2024-08-31 07:04:13,605][Main][INFO] - [train] Step 8100 out of 20000 | Loss --> 1.960 | Grad_l2 --> 0.217 | Weights_l2 --> 11205.246 | Lr --> 0.009 | Seconds_per_step --> 4.864 | [2024-08-31 07:06:15,513][Main][INFO] - [train] Step 8125 out of 20000 | Loss --> 1.971 | Grad_l2 --> 0.217 | Weights_l2 --> 11205.844 | Lr --> 0.009 | Seconds_per_step --> 4.876 | [2024-08-31 07:08:18,638][Main][INFO] - [train] Step 8150 out of 20000 | Loss --> 1.964 | Grad_l2 --> 0.216 | Weights_l2 --> 11206.444 | Lr --> 0.009 | Seconds_per_step --> 4.925 | [2024-08-31 07:10:20,335][Main][INFO] - [train] Step 8175 out of 20000 | Loss --> 1.967 | Grad_l2 --> 0.221 | Weights_l2 --> 11207.015 | Lr --> 0.009 | Seconds_per_step --> 4.868 | [2024-08-31 07:12:21,978][Main][INFO] - [train] Step 8200 out of 20000 | Loss --> 1.971 | Grad_l2 --> 0.222 | Weights_l2 --> 11207.580 | Lr --> 0.009 | Seconds_per_step --> 4.866 | [2024-08-31 07:14:25,220][Main][INFO] - [train] Step 8225 out of 20000 | Loss --> 1.969 | Grad_l2 --> 0.215 | Weights_l2 --> 11208.164 | Lr --> 0.009 | Seconds_per_step --> 4.930 | [2024-08-31 07:16:26,949][Main][INFO] - [train] Step 8250 out of 20000 | Loss --> 1.976 | Grad_l2 --> 0.217 | Weights_l2 --> 11208.739 | Lr --> 0.009 | Seconds_per_step --> 4.869 | [2024-08-31 07:18:28,653][Main][INFO] - [train] Step 8275 out of 20000 | Loss --> 1.965 | Grad_l2 --> 0.226 | Weights_l2 --> 11209.299 | Lr --> 0.009 | Seconds_per_step --> 4.868 | [2024-08-31 07:20:30,259][Main][INFO] - [train] Step 8300 out of 20000 | Loss --> 1.958 | Grad_l2 --> 0.219 | Weights_l2 --> 11209.873 | Lr --> 0.009 | Seconds_per_step --> 4.864 | [2024-08-31 07:22:33,455][Main][INFO] - [train] Step 8325 out of 20000 | Loss --> 1.967 | Grad_l2 --> 0.216 | Weights_l2 --> 11210.450 | Lr --> 0.009 | Seconds_per_step --> 4.928 | [2024-08-31 07:24:35,013][Main][INFO] - [train] Step 8350 out of 20000 | Loss --> 1.958 | Grad_l2 --> 0.217 | Weights_l2 --> 11211.010 | Lr --> 0.009 | Seconds_per_step --> 4.862 | [2024-08-31 07:26:36,485][Main][INFO] - [train] Step 8375 out of 20000 | Loss --> 1.982 | Grad_l2 --> 0.213 | Weights_l2 --> 11211.579 | Lr --> 0.009 | Seconds_per_step --> 4.859 | [2024-08-31 07:28:39,503][Main][INFO] - [train] Step 8400 out of 20000 | Loss --> 1.966 | Grad_l2 --> 0.212 | Weights_l2 --> 11212.111 | Lr --> 0.009 | Seconds_per_step --> 4.921 | [2024-08-31 07:30:40,974][Main][INFO] - [train] Step 8425 out of 20000 | Loss --> 1.963 | Grad_l2 --> 0.211 | Weights_l2 --> 11212.650 | Lr --> 0.009 | Seconds_per_step --> 4.859 | [2024-08-31 07:32:42,449][Main][INFO] - [train] Step 8450 out of 20000 | Loss --> 1.946 | Grad_l2 --> 0.212 | Weights_l2 --> 11213.206 | Lr --> 0.009 | Seconds_per_step --> 4.859 | [2024-08-31 07:34:45,622][Main][INFO] - [train] Step 8475 out of 20000 | Loss --> 1.955 | Grad_l2 --> 0.211 | Weights_l2 --> 11213.737 | Lr --> 0.009 | Seconds_per_step --> 4.927 | [2024-08-31 07:36:47,179][Main][INFO] - [train] Step 8500 out of 20000 | Loss --> 1.953 | Grad_l2 --> 0.212 | Weights_l2 --> 11214.285 | Lr --> 0.009 | Seconds_per_step --> 4.862 | [2024-08-31 07:38:48,668][Main][INFO] - [train] Step 8525 out of 20000 | Loss --> 1.952 | Grad_l2 --> 0.212 | Weights_l2 --> 11214.823 | Lr --> 0.009 | Seconds_per_step --> 4.859 | [2024-08-31 07:40:51,793][Main][INFO] - [train] Step 8550 out of 20000 | Loss --> 1.959 | Grad_l2 --> 0.215 | Weights_l2 --> 11215.363 | Lr --> 0.009 | Seconds_per_step --> 4.925 | [2024-08-31 07:42:53,199][Main][INFO] - [train] Step 8575 out of 20000 | Loss --> 1.951 | Grad_l2 --> 0.218 | Weights_l2 --> 11215.899 | Lr --> 0.009 | Seconds_per_step --> 4.856 | [2024-08-31 07:44:54,659][Main][INFO] - [train] Step 8600 out of 20000 | Loss --> 1.951 | Grad_l2 --> 0.211 | Weights_l2 --> 11216.430 | Lr --> 0.009 | Seconds_per_step --> 4.858 | [2024-08-31 07:46:57,810][Main][INFO] - [train] Step 8625 out of 20000 | Loss --> 1.940 | Grad_l2 --> 0.213 | Weights_l2 --> 11216.965 | Lr --> 0.009 | Seconds_per_step --> 4.926 | [2024-08-31 07:48:59,572][Main][INFO] - [train] Step 8650 out of 20000 | Loss --> 1.948 | Grad_l2 --> 0.216 | Weights_l2 --> 11217.500 | Lr --> 0.009 | Seconds_per_step --> 4.870 | [2024-08-31 07:51:01,178][Main][INFO] - [train] Step 8675 out of 20000 | Loss --> 1.949 | Grad_l2 --> 0.212 | Weights_l2 --> 11218.038 | Lr --> 0.009 | Seconds_per_step --> 4.864 | [2024-08-31 07:53:04,308][Main][INFO] - [train] Step 8700 out of 20000 | Loss --> 1.946 | Grad_l2 --> 0.208 | Weights_l2 --> 11218.557 | Lr --> 0.009 | Seconds_per_step --> 4.925 | [2024-08-31 07:55:06,183][Main][INFO] - [train] Step 8725 out of 20000 | Loss --> 1.940 | Grad_l2 --> 0.223 | Weights_l2 --> 11219.081 | Lr --> 0.009 | Seconds_per_step --> 4.875 | [2024-08-31 07:57:07,685][Main][INFO] - [train] Step 8750 out of 20000 | Loss --> 1.941 | Grad_l2 --> 0.247 | Weights_l2 --> 11219.635 | Lr --> 0.009 | Seconds_per_step --> 4.860 | [2024-08-31 07:59:09,331][Main][INFO] - [train] Step 8775 out of 20000 | Loss --> 1.954 | Grad_l2 --> 0.250 | Weights_l2 --> 11220.232 | Lr --> 0.009 | Seconds_per_step --> 4.866 | [2024-08-31 08:01:12,729][Main][INFO] - [train] Step 8800 out of 20000 | Loss --> 1.952 | Grad_l2 --> 0.220 | Weights_l2 --> 11220.781 | Lr --> 0.009 | Seconds_per_step --> 4.936 | [2024-08-31 08:03:14,622][Main][INFO] - [train] Step 8825 out of 20000 | Loss --> 1.949 | Grad_l2 --> 0.223 | Weights_l2 --> 11221.312 | Lr --> 0.008 | Seconds_per_step --> 4.876 | [2024-08-31 08:05:16,325][Main][INFO] - [train] Step 8850 out of 20000 | Loss --> 1.939 | Grad_l2 --> 0.214 | Weights_l2 --> 11221.830 | Lr --> 0.008 | Seconds_per_step --> 4.868 | [2024-08-31 08:07:19,742][Main][INFO] - [train] Step 8875 out of 20000 | Loss --> 1.946 | Grad_l2 --> 0.216 | Weights_l2 --> 11222.326 | Lr --> 0.008 | Seconds_per_step --> 4.937 | [2024-08-31 08:09:22,518][Main][INFO] - [train] Step 8900 out of 20000 | Loss --> 1.939 | Grad_l2 --> 0.220 | Weights_l2 --> 11222.839 | Lr --> 0.008 | Seconds_per_step --> 4.911 | [2024-08-31 08:11:24,560][Main][INFO] - [train] Step 8925 out of 20000 | Loss --> 1.950 | Grad_l2 --> 0.218 | Weights_l2 --> 11223.347 | Lr --> 0.008 | Seconds_per_step --> 4.882 | [2024-08-31 08:13:28,122][Main][INFO] - [train] Step 8950 out of 20000 | Loss --> 1.947 | Grad_l2 --> 0.217 | Weights_l2 --> 11223.844 | Lr --> 0.008 | Seconds_per_step --> 4.942 | [2024-08-31 08:15:30,378][Main][INFO] - [train] Step 8975 out of 20000 | Loss --> 1.933 | Grad_l2 --> 0.217 | Weights_l2 --> 11224.362 | Lr --> 0.008 | Seconds_per_step --> 4.890 | [2024-08-31 08:17:32,027][Main][INFO] - [train] Step 9000 out of 20000 | Loss --> 1.926 | Grad_l2 --> 0.214 | Weights_l2 --> 11224.870 | Lr --> 0.008 | Seconds_per_step --> 4.866 | [2024-08-31 08:19:35,112][Main][INFO] - [train] Step 9025 out of 20000 | Loss --> 1.950 | Grad_l2 --> 0.213 | Weights_l2 --> 11225.352 | Lr --> 0.008 | Seconds_per_step --> 4.923 | [2024-08-31 08:21:37,091][Main][INFO] - [train] Step 9050 out of 20000 | Loss --> 1.940 | Grad_l2 --> 0.216 | Weights_l2 --> 11225.836 | Lr --> 0.008 | Seconds_per_step --> 4.879 | [2024-08-31 08:23:39,114][Main][INFO] - [train] Step 9075 out of 20000 | Loss --> 1.922 | Grad_l2 --> 0.218 | Weights_l2 --> 11226.343 | Lr --> 0.008 | Seconds_per_step --> 4.881 | [2024-08-31 08:25:42,309][Main][INFO] - [train] Step 9100 out of 20000 | Loss --> 1.932 | Grad_l2 --> 0.219 | Weights_l2 --> 11226.790 | Lr --> 0.008 | Seconds_per_step --> 4.928 | [2024-08-31 08:27:43,895][Main][INFO] - [train] Step 9125 out of 20000 | Loss --> 1.925 | Grad_l2 --> 0.212 | Weights_l2 --> 11227.270 | Lr --> 0.008 | Seconds_per_step --> 4.863 | [2024-08-31 08:29:45,690][Main][INFO] - [train] Step 9150 out of 20000 | Loss --> 1.930 | Grad_l2 --> 0.216 | Weights_l2 --> 11227.751 | Lr --> 0.008 | Seconds_per_step --> 4.872 | [2024-08-31 08:31:47,340][Main][INFO] - [train] Step 9175 out of 20000 | Loss --> 1.921 | Grad_l2 --> 0.210 | Weights_l2 --> 11228.209 | Lr --> 0.008 | Seconds_per_step --> 4.866 | [2024-08-31 08:33:50,553][Main][INFO] - [train] Step 9200 out of 20000 | Loss --> 1.922 | Grad_l2 --> 0.218 | Weights_l2 --> 11228.667 | Lr --> 0.008 | Seconds_per_step --> 4.928 | [2024-08-31 08:35:52,332][Main][INFO] - [train] Step 9225 out of 20000 | Loss --> 1.928 | Grad_l2 --> 0.215 | Weights_l2 --> 11229.126 | Lr --> 0.008 | Seconds_per_step --> 4.871 | [2024-08-31 08:37:53,815][Main][INFO] - [train] Step 9250 out of 20000 | Loss --> 1.931 | Grad_l2 --> 0.211 | Weights_l2 --> 11229.582 | Lr --> 0.008 | Seconds_per_step --> 4.859 | [2024-08-31 08:39:57,086][Main][INFO] - [train] Step 9275 out of 20000 | Loss --> 1.927 | Grad_l2 --> 0.211 | Weights_l2 --> 11230.060 | Lr --> 0.008 | Seconds_per_step --> 4.931 | [2024-08-31 08:41:58,909][Main][INFO] - [train] Step 9300 out of 20000 | Loss --> 1.918 | Grad_l2 --> 0.218 | Weights_l2 --> 11230.518 | Lr --> 0.008 | Seconds_per_step --> 4.873 | [2024-08-31 08:44:00,623][Main][INFO] - [train] Step 9325 out of 20000 | Loss --> 1.933 | Grad_l2 --> 0.216 | Weights_l2 --> 11230.972 | Lr --> 0.008 | Seconds_per_step --> 4.868 | [2024-08-31 08:46:03,784][Main][INFO] - [train] Step 9350 out of 20000 | Loss --> 1.916 | Grad_l2 --> 0.216 | Weights_l2 --> 11231.439 | Lr --> 0.008 | Seconds_per_step --> 4.926 | [2024-08-31 08:48:05,439][Main][INFO] - [train] Step 9375 out of 20000 | Loss --> 1.937 | Grad_l2 --> 0.220 | Weights_l2 --> 11231.898 | Lr --> 0.008 | Seconds_per_step --> 4.866 | [2024-08-31 08:50:06,935][Main][INFO] - [train] Step 9400 out of 20000 | Loss --> 1.915 | Grad_l2 --> 0.212 | Weights_l2 --> 11232.362 | Lr --> 0.008 | Seconds_per_step --> 4.860 | [2024-08-31 08:52:10,261][Main][INFO] - [train] Step 9425 out of 20000 | Loss --> 1.907 | Grad_l2 --> 0.209 | Weights_l2 --> 11232.791 | Lr --> 0.008 | Seconds_per_step --> 4.933 | [2024-08-31 08:54:11,949][Main][INFO] - [train] Step 9450 out of 20000 | Loss --> 1.914 | Grad_l2 --> 0.209 | Weights_l2 --> 11233.217 | Lr --> 0.008 | Seconds_per_step --> 4.867 | [2024-08-31 08:56:13,860][Main][INFO] - [train] Step 9475 out of 20000 | Loss --> 1.916 | Grad_l2 --> 0.211 | Weights_l2 --> 11233.653 | Lr --> 0.008 | Seconds_per_step --> 4.876 | [2024-08-31 08:58:16,777][Main][INFO] - [train] Step 9500 out of 20000 | Loss --> 1.911 | Grad_l2 --> 0.214 | Weights_l2 --> 11234.110 | Lr --> 0.008 | Seconds_per_step --> 4.917 | [2024-08-31 09:00:18,486][Main][INFO] - [train] Step 9525 out of 20000 | Loss --> 1.914 | Grad_l2 --> 0.210 | Weights_l2 --> 11234.527 | Lr --> 0.008 | Seconds_per_step --> 4.868 | [2024-08-31 09:02:20,373][Main][INFO] - [train] Step 9550 out of 20000 | Loss --> 1.904 | Grad_l2 --> 0.210 | Weights_l2 --> 11234.960 | Lr --> 0.008 | Seconds_per_step --> 4.875 | [2024-08-31 09:04:23,807][Main][INFO] - [train] Step 9575 out of 20000 | Loss --> 1.904 | Grad_l2 --> 0.216 | Weights_l2 --> 11235.392 | Lr --> 0.008 | Seconds_per_step --> 4.937 | [2024-08-31 09:06:25,807][Main][INFO] - [train] Step 9600 out of 20000 | Loss --> 1.902 | Grad_l2 --> 0.211 | Weights_l2 --> 11235.828 | Lr --> 0.008 | Seconds_per_step --> 4.880 | [2024-08-31 09:08:27,695][Main][INFO] - [train] Step 9625 out of 20000 | Loss --> 1.907 | Grad_l2 --> 0.211 | Weights_l2 --> 11236.267 | Lr --> 0.008 | Seconds_per_step --> 4.875 | [2024-08-31 09:10:29,511][Main][INFO] - [train] Step 9650 out of 20000 | Loss --> 1.907 | Grad_l2 --> 0.208 | Weights_l2 --> 11236.680 | Lr --> 0.008 | Seconds_per_step --> 4.873 | [2024-08-31 09:12:32,790][Main][INFO] - [train] Step 9675 out of 20000 | Loss --> 1.915 | Grad_l2 --> 0.205 | Weights_l2 --> 11237.101 | Lr --> 0.008 | Seconds_per_step --> 4.931 | [2024-08-31 09:14:34,684][Main][INFO] - [train] Step 9700 out of 20000 | Loss --> 1.906 | Grad_l2 --> 0.207 | Weights_l2 --> 11237.492 | Lr --> 0.008 | Seconds_per_step --> 4.876 | [2024-08-31 09:16:36,449][Main][INFO] - [train] Step 9725 out of 20000 | Loss --> 1.908 | Grad_l2 --> 0.210 | Weights_l2 --> 11237.897 | Lr --> 0.008 | Seconds_per_step --> 4.871 | [2024-08-31 09:18:39,632][Main][INFO] - [train] Step 9750 out of 20000 | Loss --> 1.905 | Grad_l2 --> 0.209 | Weights_l2 --> 11238.297 | Lr --> 0.008 | Seconds_per_step --> 4.927 | [2024-08-31 09:20:41,253][Main][INFO] - [train] Step 9775 out of 20000 | Loss --> 1.898 | Grad_l2 --> 0.205 | Weights_l2 --> 11238.692 | Lr --> 0.008 | Seconds_per_step --> 4.865 | [2024-08-31 09:22:42,972][Main][INFO] - [train] Step 9800 out of 20000 | Loss --> 1.898 | Grad_l2 --> 0.205 | Weights_l2 --> 11239.115 | Lr --> 0.008 | Seconds_per_step --> 4.869 | [2024-08-31 09:24:45,983][Main][INFO] - [train] Step 9825 out of 20000 | Loss --> 1.898 | Grad_l2 --> 0.211 | Weights_l2 --> 11239.507 | Lr --> 0.008 | Seconds_per_step --> 4.920 | [2024-08-31 09:26:47,646][Main][INFO] - [train] Step 9850 out of 20000 | Loss --> 1.905 | Grad_l2 --> 0.212 | Weights_l2 --> 11239.896 | Lr --> 0.008 | Seconds_per_step --> 4.866 | [2024-08-31 09:28:49,085][Main][INFO] - [train] Step 9875 out of 20000 | Loss --> 1.908 | Grad_l2 --> 0.210 | Weights_l2 --> 11240.284 | Lr --> 0.008 | Seconds_per_step --> 4.857 | [2024-08-31 09:30:52,213][Main][INFO] - [train] Step 9900 out of 20000 | Loss --> 1.889 | Grad_l2 --> 0.208 | Weights_l2 --> 11240.678 | Lr --> 0.008 | Seconds_per_step --> 4.925 | [2024-08-31 09:32:53,902][Main][INFO] - [train] Step 9925 out of 20000 | Loss --> 1.895 | Grad_l2 --> 0.208 | Weights_l2 --> 11241.061 | Lr --> 0.008 | Seconds_per_step --> 4.867 | [2024-08-31 09:34:55,612][Main][INFO] - [train] Step 9950 out of 20000 | Loss --> 1.892 | Grad_l2 --> 0.207 | Weights_l2 --> 11241.429 | Lr --> 0.008 | Seconds_per_step --> 4.868 | [2024-08-31 09:36:58,742][Main][INFO] - [train] Step 9975 out of 20000 | Loss --> 1.890 | Grad_l2 --> 0.204 | Weights_l2 --> 11241.823 | Lr --> 0.008 | Seconds_per_step --> 4.925 | [2024-08-31 09:39:00,595][Main][INFO] - [train] Step 10000 out of 20000 | Loss --> 1.881 | Grad_l2 --> 0.207 | Weights_l2 --> 11242.179 | Lr --> 0.008 | Seconds_per_step --> 4.874 | [2024-08-31 09:39:00,596][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-10000 [2024-08-31 09:39:00,603][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-31 09:39:08,019][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-10000/model.safetensors [2024-08-31 09:39:17,428][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-10000/optimizer.bin [2024-08-31 09:39:17,429][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-10000/scheduler.bin [2024-08-31 09:39:17,430][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-10000/sampler.bin [2024-08-31 09:39:17,430][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-10000/sampler_1.bin [2024-08-31 09:39:17,431][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-10000/random_states_0.pkl [2024-08-31 09:41:18,796][Main][INFO] - [train] Step 10025 out of 20000 | Loss --> 1.897 | Grad_l2 --> 0.207 | Weights_l2 --> 11242.581 | Lr --> 0.007 | Seconds_per_step --> 5.528 | [2024-08-31 09:43:20,384][Main][INFO] - [train] Step 10050 out of 20000 | Loss --> 1.895 | Grad_l2 --> 0.208 | Weights_l2 --> 11242.935 | Lr --> 0.007 | Seconds_per_step --> 4.863 | [2024-08-31 09:45:23,544][Main][INFO] - [train] Step 10075 out of 20000 | Loss --> 1.897 | Grad_l2 --> 0.208 | Weights_l2 --> 11243.311 | Lr --> 0.007 | Seconds_per_step --> 4.926 | [2024-08-31 09:47:25,242][Main][INFO] - [train] Step 10100 out of 20000 | Loss --> 1.874 | Grad_l2 --> 0.212 | Weights_l2 --> 11243.662 | Lr --> 0.007 | Seconds_per_step --> 4.868 | [2024-08-31 09:49:27,130][Main][INFO] - [train] Step 10125 out of 20000 | Loss --> 1.878 | Grad_l2 --> 0.213 | Weights_l2 --> 11244.042 | Lr --> 0.007 | Seconds_per_step --> 4.875 | [2024-08-31 09:51:30,546][Main][INFO] - [train] Step 10150 out of 20000 | Loss --> 1.893 | Grad_l2 --> 0.210 | Weights_l2 --> 11244.420 | Lr --> 0.007 | Seconds_per_step --> 4.937 | [2024-08-31 09:53:32,219][Main][INFO] - [train] Step 10175 out of 20000 | Loss --> 1.880 | Grad_l2 --> 0.211 | Weights_l2 --> 11244.778 | Lr --> 0.007 | Seconds_per_step --> 4.867 | [2024-08-31 09:55:33,935][Main][INFO] - [train] Step 10200 out of 20000 | Loss --> 1.888 | Grad_l2 --> 0.210 | Weights_l2 --> 11245.142 | Lr --> 0.007 | Seconds_per_step --> 4.869 | [2024-08-31 09:57:37,097][Main][INFO] - [train] Step 10225 out of 20000 | Loss --> 1.890 | Grad_l2 --> 0.210 | Weights_l2 --> 11245.502 | Lr --> 0.007 | Seconds_per_step --> 4.926 | [2024-08-31 09:59:38,829][Main][INFO] - [train] Step 10250 out of 20000 | Loss --> 1.894 | Grad_l2 --> 0.206 | Weights_l2 --> 11245.861 | Lr --> 0.007 | Seconds_per_step --> 4.869 | [2024-08-31 10:01:40,434][Main][INFO] - [train] Step 10275 out of 20000 | Loss --> 1.878 | Grad_l2 --> 0.208 | Weights_l2 --> 11246.203 | Lr --> 0.007 | Seconds_per_step --> 4.864 | [2024-08-31 10:03:43,678][Main][INFO] - [train] Step 10300 out of 20000 | Loss --> 1.886 | Grad_l2 --> 0.208 | Weights_l2 --> 11246.551 | Lr --> 0.007 | Seconds_per_step --> 4.930 | [2024-08-31 10:05:45,211][Main][INFO] - [train] Step 10325 out of 20000 | Loss --> 1.891 | Grad_l2 --> 0.208 | Weights_l2 --> 11246.890 | Lr --> 0.007 | Seconds_per_step --> 4.861 | [2024-08-31 10:07:46,838][Main][INFO] - [train] Step 10350 out of 20000 | Loss --> 1.877 | Grad_l2 --> 0.206 | Weights_l2 --> 11247.220 | Lr --> 0.007 | Seconds_per_step --> 4.865 | [2024-08-31 10:09:49,932][Main][INFO] - [train] Step 10375 out of 20000 | Loss --> 1.883 | Grad_l2 --> 0.205 | Weights_l2 --> 11247.552 | Lr --> 0.007 | Seconds_per_step --> 4.924 | [2024-08-31 10:11:51,914][Main][INFO] - [train] Step 10400 out of 20000 | Loss --> 1.884 | Grad_l2 --> 0.205 | Weights_l2 --> 11247.910 | Lr --> 0.007 | Seconds_per_step --> 4.879 | [2024-08-31 10:13:53,538][Main][INFO] - [train] Step 10425 out of 20000 | Loss --> 1.890 | Grad_l2 --> 0.203 | Weights_l2 --> 11248.247 | Lr --> 0.007 | Seconds_per_step --> 4.865 | [2024-08-31 10:15:56,691][Main][INFO] - [train] Step 10450 out of 20000 | Loss --> 1.879 | Grad_l2 --> 0.207 | Weights_l2 --> 11248.574 | Lr --> 0.007 | Seconds_per_step --> 4.926 | [2024-08-31 10:17:58,450][Main][INFO] - [train] Step 10475 out of 20000 | Loss --> 1.878 | Grad_l2 --> 0.204 | Weights_l2 --> 11248.888 | Lr --> 0.007 | Seconds_per_step --> 4.870 | [2024-08-31 10:20:00,062][Main][INFO] - [train] Step 10500 out of 20000 | Loss --> 1.886 | Grad_l2 --> 0.206 | Weights_l2 --> 11249.219 | Lr --> 0.007 | Seconds_per_step --> 4.864 | [2024-08-31 10:22:01,623][Main][INFO] - [train] Step 10525 out of 20000 | Loss --> 1.881 | Grad_l2 --> 0.211 | Weights_l2 --> 11249.538 | Lr --> 0.007 | Seconds_per_step --> 4.862 | [2024-08-31 10:24:04,784][Main][INFO] - [train] Step 10550 out of 20000 | Loss --> 1.883 | Grad_l2 --> 0.205 | Weights_l2 --> 11249.862 | Lr --> 0.007 | Seconds_per_step --> 4.926 | [2024-08-31 10:26:06,294][Main][INFO] - [train] Step 10575 out of 20000 | Loss --> 1.879 | Grad_l2 --> 0.208 | Weights_l2 --> 11250.196 | Lr --> 0.007 | Seconds_per_step --> 4.860 | [2024-08-31 10:28:07,877][Main][INFO] - [train] Step 10600 out of 20000 | Loss --> 1.897 | Grad_l2 --> 0.208 | Weights_l2 --> 11250.519 | Lr --> 0.007 | Seconds_per_step --> 4.863 | [2024-08-31 10:30:11,017][Main][INFO] - [train] Step 10625 out of 20000 | Loss --> 1.877 | Grad_l2 --> 0.203 | Weights_l2 --> 11250.836 | Lr --> 0.007 | Seconds_per_step --> 4.926 | [2024-08-31 10:32:12,564][Main][INFO] - [train] Step 10650 out of 20000 | Loss --> 1.888 | Grad_l2 --> 0.208 | Weights_l2 --> 11251.148 | Lr --> 0.007 | Seconds_per_step --> 4.862 | [2024-08-31 10:34:14,220][Main][INFO] - [train] Step 10675 out of 20000 | Loss --> 1.887 | Grad_l2 --> 0.203 | Weights_l2 --> 11251.440 | Lr --> 0.007 | Seconds_per_step --> 4.866 | [2024-08-31 10:36:17,528][Main][INFO] - [train] Step 10700 out of 20000 | Loss --> 1.878 | Grad_l2 --> 0.204 | Weights_l2 --> 11251.738 | Lr --> 0.007 | Seconds_per_step --> 4.932 | [2024-08-31 10:38:19,033][Main][INFO] - [train] Step 10725 out of 20000 | Loss --> 1.879 | Grad_l2 --> 0.204 | Weights_l2 --> 11252.035 | Lr --> 0.007 | Seconds_per_step --> 4.860 | [2024-08-31 10:40:20,686][Main][INFO] - [train] Step 10750 out of 20000 | Loss --> 1.875 | Grad_l2 --> 0.208 | Weights_l2 --> 11252.329 | Lr --> 0.007 | Seconds_per_step --> 4.866 | [2024-08-31 10:42:23,761][Main][INFO] - [train] Step 10775 out of 20000 | Loss --> 1.893 | Grad_l2 --> 0.201 | Weights_l2 --> 11252.655 | Lr --> 0.007 | Seconds_per_step --> 4.923 | [2024-08-31 10:44:25,178][Main][INFO] - [train] Step 10800 out of 20000 | Loss --> 1.879 | Grad_l2 --> 0.205 | Weights_l2 --> 11252.943 | Lr --> 0.007 | Seconds_per_step --> 4.857 | [2024-08-31 10:46:26,667][Main][INFO] - [train] Step 10825 out of 20000 | Loss --> 1.886 | Grad_l2 --> 0.208 | Weights_l2 --> 11253.228 | Lr --> 0.007 | Seconds_per_step --> 4.859 | [2024-08-31 10:48:29,893][Main][INFO] - [train] Step 10850 out of 20000 | Loss --> 1.881 | Grad_l2 --> 0.204 | Weights_l2 --> 11253.525 | Lr --> 0.007 | Seconds_per_step --> 4.929 | [2024-08-31 10:50:31,485][Main][INFO] - [train] Step 10875 out of 20000 | Loss --> 1.886 | Grad_l2 --> 0.205 | Weights_l2 --> 11253.804 | Lr --> 0.007 | Seconds_per_step --> 4.864 | [2024-08-31 10:52:33,088][Main][INFO] - [train] Step 10900 out of 20000 | Loss --> 1.881 | Grad_l2 --> 0.206 | Weights_l2 --> 11254.087 | Lr --> 0.007 | Seconds_per_step --> 4.864 | [2024-08-31 10:54:34,555][Main][INFO] - [train] Step 10925 out of 20000 | Loss --> 1.872 | Grad_l2 --> 0.205 | Weights_l2 --> 11254.364 | Lr --> 0.007 | Seconds_per_step --> 4.859 | [2024-08-31 10:56:37,704][Main][INFO] - [train] Step 10950 out of 20000 | Loss --> 1.887 | Grad_l2 --> 0.204 | Weights_l2 --> 11254.652 | Lr --> 0.007 | Seconds_per_step --> 4.926 | [2024-08-31 10:58:39,349][Main][INFO] - [train] Step 10975 out of 20000 | Loss --> 1.880 | Grad_l2 --> 0.203 | Weights_l2 --> 11254.940 | Lr --> 0.007 | Seconds_per_step --> 4.866 | [2024-08-31 11:00:40,761][Main][INFO] - [train] Step 11000 out of 20000 | Loss --> 1.872 | Grad_l2 --> 0.203 | Weights_l2 --> 11255.197 | Lr --> 0.007 | Seconds_per_step --> 4.856 | [2024-08-31 11:02:43,937][Main][INFO] - [train] Step 11025 out of 20000 | Loss --> 1.888 | Grad_l2 --> 0.206 | Weights_l2 --> 11255.468 | Lr --> 0.007 | Seconds_per_step --> 4.927 | [2024-08-31 11:04:45,475][Main][INFO] - [train] Step 11050 out of 20000 | Loss --> 1.884 | Grad_l2 --> 0.203 | Weights_l2 --> 11255.733 | Lr --> 0.007 | Seconds_per_step --> 4.861 | [2024-08-31 11:06:47,030][Main][INFO] - [train] Step 11075 out of 20000 | Loss --> 1.872 | Grad_l2 --> 0.202 | Weights_l2 --> 11255.993 | Lr --> 0.006 | Seconds_per_step --> 4.862 | [2024-08-31 11:08:50,122][Main][INFO] - [train] Step 11100 out of 20000 | Loss --> 1.877 | Grad_l2 --> 0.205 | Weights_l2 --> 11256.257 | Lr --> 0.006 | Seconds_per_step --> 4.924 | [2024-08-31 11:10:51,718][Main][INFO] - [train] Step 11125 out of 20000 | Loss --> 1.888 | Grad_l2 --> 0.208 | Weights_l2 --> 11256.519 | Lr --> 0.006 | Seconds_per_step --> 4.864 | [2024-08-31 11:12:53,262][Main][INFO] - [train] Step 11150 out of 20000 | Loss --> 1.871 | Grad_l2 --> 0.202 | Weights_l2 --> 11256.768 | Lr --> 0.006 | Seconds_per_step --> 4.862 | [2024-08-31 11:14:56,320][Main][INFO] - [train] Step 11175 out of 20000 | Loss --> 1.880 | Grad_l2 --> 0.204 | Weights_l2 --> 11257.032 | Lr --> 0.006 | Seconds_per_step --> 4.922 | [2024-08-31 11:16:57,936][Main][INFO] - [train] Step 11200 out of 20000 | Loss --> 1.878 | Grad_l2 --> 0.209 | Weights_l2 --> 11257.286 | Lr --> 0.006 | Seconds_per_step --> 4.865 | [2024-08-31 11:18:59,500][Main][INFO] - [train] Step 11225 out of 20000 | Loss --> 1.891 | Grad_l2 --> 0.205 | Weights_l2 --> 11257.533 | Lr --> 0.006 | Seconds_per_step --> 4.862 | [2024-08-31 11:21:02,630][Main][INFO] - [train] Step 11250 out of 20000 | Loss --> 1.886 | Grad_l2 --> 0.204 | Weights_l2 --> 11257.790 | Lr --> 0.006 | Seconds_per_step --> 4.925 | [2024-08-31 11:23:04,112][Main][INFO] - [train] Step 11275 out of 20000 | Loss --> 1.882 | Grad_l2 --> 0.202 | Weights_l2 --> 11258.017 | Lr --> 0.006 | Seconds_per_step --> 4.859 | [2024-08-31 11:25:05,770][Main][INFO] - [train] Step 11300 out of 20000 | Loss --> 1.867 | Grad_l2 --> 0.205 | Weights_l2 --> 11258.263 | Lr --> 0.006 | Seconds_per_step --> 4.866 | [2024-08-31 11:27:08,802][Main][INFO] - [train] Step 11325 out of 20000 | Loss --> 1.882 | Grad_l2 --> 0.200 | Weights_l2 --> 11258.504 | Lr --> 0.006 | Seconds_per_step --> 4.921 | [2024-08-31 11:29:10,345][Main][INFO] - [train] Step 11350 out of 20000 | Loss --> 1.877 | Grad_l2 --> 0.199 | Weights_l2 --> 11258.721 | Lr --> 0.006 | Seconds_per_step --> 4.862 | [2024-08-31 11:31:11,797][Main][INFO] - [train] Step 11375 out of 20000 | Loss --> 1.886 | Grad_l2 --> 0.203 | Weights_l2 --> 11258.969 | Lr --> 0.006 | Seconds_per_step --> 4.858 | [2024-08-31 11:33:13,300][Main][INFO] - [train] Step 11400 out of 20000 | Loss --> 1.868 | Grad_l2 --> 0.207 | Weights_l2 --> 11259.173 | Lr --> 0.006 | Seconds_per_step --> 4.860 | [2024-08-31 11:35:16,336][Main][INFO] - [train] Step 11425 out of 20000 | Loss --> 1.877 | Grad_l2 --> 0.208 | Weights_l2 --> 11259.407 | Lr --> 0.006 | Seconds_per_step --> 4.921 | [2024-08-31 11:37:17,860][Main][INFO] - [train] Step 11450 out of 20000 | Loss --> 1.854 | Grad_l2 --> 0.202 | Weights_l2 --> 11259.657 | Lr --> 0.006 | Seconds_per_step --> 4.861 | [2024-08-31 11:39:19,326][Main][INFO] - [train] Step 11475 out of 20000 | Loss --> 1.878 | Grad_l2 --> 0.203 | Weights_l2 --> 11259.878 | Lr --> 0.006 | Seconds_per_step --> 4.859 | [2024-08-31 11:41:22,262][Main][INFO] - [train] Step 11500 out of 20000 | Loss --> 1.869 | Grad_l2 --> 0.206 | Weights_l2 --> 11260.097 | Lr --> 0.006 | Seconds_per_step --> 4.917 | [2024-08-31 11:43:23,664][Main][INFO] - [train] Step 11525 out of 20000 | Loss --> 1.870 | Grad_l2 --> 0.204 | Weights_l2 --> 11260.317 | Lr --> 0.006 | Seconds_per_step --> 4.856 | [2024-08-31 11:45:25,443][Main][INFO] - [train] Step 11550 out of 20000 | Loss --> 1.875 | Grad_l2 --> 0.202 | Weights_l2 --> 11260.541 | Lr --> 0.006 | Seconds_per_step --> 4.871 | [2024-08-31 11:47:28,732][Main][INFO] - [train] Step 11575 out of 20000 | Loss --> 1.870 | Grad_l2 --> 0.204 | Weights_l2 --> 11260.757 | Lr --> 0.006 | Seconds_per_step --> 4.931 | [2024-08-31 11:49:30,239][Main][INFO] - [train] Step 11600 out of 20000 | Loss --> 1.887 | Grad_l2 --> 0.202 | Weights_l2 --> 11260.980 | Lr --> 0.006 | Seconds_per_step --> 4.860 | [2024-08-31 11:51:31,749][Main][INFO] - [train] Step 11625 out of 20000 | Loss --> 1.875 | Grad_l2 --> 0.202 | Weights_l2 --> 11261.190 | Lr --> 0.006 | Seconds_per_step --> 4.860 | [2024-08-31 11:53:34,786][Main][INFO] - [train] Step 11650 out of 20000 | Loss --> 1.862 | Grad_l2 --> 0.202 | Weights_l2 --> 11261.412 | Lr --> 0.006 | Seconds_per_step --> 4.921 | [2024-08-31 11:55:36,387][Main][INFO] - [train] Step 11675 out of 20000 | Loss --> 1.885 | Grad_l2 --> 0.202 | Weights_l2 --> 11261.618 | Lr --> 0.006 | Seconds_per_step --> 4.864 | [2024-08-31 11:57:37,996][Main][INFO] - [train] Step 11700 out of 20000 | Loss --> 1.861 | Grad_l2 --> 0.204 | Weights_l2 --> 11261.816 | Lr --> 0.006 | Seconds_per_step --> 4.864 | [2024-08-31 11:59:41,039][Main][INFO] - [train] Step 11725 out of 20000 | Loss --> 1.861 | Grad_l2 --> 0.204 | Weights_l2 --> 11262.001 | Lr --> 0.006 | Seconds_per_step --> 4.922 | [2024-08-31 12:01:42,731][Main][INFO] - [train] Step 11750 out of 20000 | Loss --> 1.860 | Grad_l2 --> 0.203 | Weights_l2 --> 11262.202 | Lr --> 0.006 | Seconds_per_step --> 4.868 | [2024-08-31 12:03:44,246][Main][INFO] - [train] Step 11775 out of 20000 | Loss --> 1.861 | Grad_l2 --> 0.205 | Weights_l2 --> 11262.407 | Lr --> 0.006 | Seconds_per_step --> 4.860 | [2024-08-31 12:05:47,316][Main][INFO] - [train] Step 11800 out of 20000 | Loss --> 1.854 | Grad_l2 --> 0.200 | Weights_l2 --> 11262.592 | Lr --> 0.006 | Seconds_per_step --> 4.923 | [2024-08-31 12:07:48,887][Main][INFO] - [train] Step 11825 out of 20000 | Loss --> 1.872 | Grad_l2 --> 0.204 | Weights_l2 --> 11262.783 | Lr --> 0.006 | Seconds_per_step --> 4.863 | [2024-08-31 12:09:50,488][Main][INFO] - [train] Step 11850 out of 20000 | Loss --> 1.872 | Grad_l2 --> 0.205 | Weights_l2 --> 11262.987 | Lr --> 0.006 | Seconds_per_step --> 4.864 | [2024-08-31 12:11:52,256][Main][INFO] - [train] Step 11875 out of 20000 | Loss --> 1.873 | Grad_l2 --> 0.203 | Weights_l2 --> 11263.166 | Lr --> 0.006 | Seconds_per_step --> 4.871 | [2024-08-31 12:13:55,225][Main][INFO] - [train] Step 11900 out of 20000 | Loss --> 1.865 | Grad_l2 --> 0.206 | Weights_l2 --> 11263.352 | Lr --> 0.006 | Seconds_per_step --> 4.919 | [2024-08-31 12:15:56,821][Main][INFO] - [train] Step 11925 out of 20000 | Loss --> 1.863 | Grad_l2 --> 0.202 | Weights_l2 --> 11263.538 | Lr --> 0.006 | Seconds_per_step --> 4.864 | [2024-08-31 12:17:58,341][Main][INFO] - [train] Step 11950 out of 20000 | Loss --> 1.856 | Grad_l2 --> 0.201 | Weights_l2 --> 11263.700 | Lr --> 0.006 | Seconds_per_step --> 4.861 | [2024-08-31 12:20:01,477][Main][INFO] - [train] Step 11975 out of 20000 | Loss --> 1.864 | Grad_l2 --> 0.199 | Weights_l2 --> 11263.891 | Lr --> 0.006 | Seconds_per_step --> 4.925 | [2024-08-31 12:22:03,045][Main][INFO] - [train] Step 12000 out of 20000 | Loss --> 1.847 | Grad_l2 --> 0.202 | Weights_l2 --> 11264.073 | Lr --> 0.006 | Seconds_per_step --> 4.863 | [2024-08-31 12:24:04,755][Main][INFO] - [train] Step 12025 out of 20000 | Loss --> 1.866 | Grad_l2 --> 0.202 | Weights_l2 --> 11264.247 | Lr --> 0.006 | Seconds_per_step --> 4.868 | [2024-08-31 12:26:07,951][Main][INFO] - [train] Step 12050 out of 20000 | Loss --> 1.860 | Grad_l2 --> 0.200 | Weights_l2 --> 11264.425 | Lr --> 0.005 | Seconds_per_step --> 4.928 | [2024-08-31 12:28:09,572][Main][INFO] - [train] Step 12075 out of 20000 | Loss --> 1.861 | Grad_l2 --> 0.203 | Weights_l2 --> 11264.599 | Lr --> 0.005 | Seconds_per_step --> 4.865 | [2024-08-31 12:30:11,087][Main][INFO] - [train] Step 12100 out of 20000 | Loss --> 1.875 | Grad_l2 --> 0.202 | Weights_l2 --> 11264.770 | Lr --> 0.005 | Seconds_per_step --> 4.861 | [2024-08-31 12:32:13,977][Main][INFO] - [train] Step 12125 out of 20000 | Loss --> 1.868 | Grad_l2 --> 0.200 | Weights_l2 --> 11264.930 | Lr --> 0.005 | Seconds_per_step --> 4.916 | [2024-08-31 12:34:15,797][Main][INFO] - [train] Step 12150 out of 20000 | Loss --> 1.850 | Grad_l2 --> 0.197 | Weights_l2 --> 11265.096 | Lr --> 0.005 | Seconds_per_step --> 4.873 | [2024-08-31 12:36:17,508][Main][INFO] - [train] Step 12175 out of 20000 | Loss --> 1.854 | Grad_l2 --> 0.198 | Weights_l2 --> 11265.248 | Lr --> 0.005 | Seconds_per_step --> 4.868 | [2024-08-31 12:38:20,824][Main][INFO] - [train] Step 12200 out of 20000 | Loss --> 1.861 | Grad_l2 --> 0.205 | Weights_l2 --> 11265.394 | Lr --> 0.005 | Seconds_per_step --> 4.933 | [2024-08-31 12:40:22,305][Main][INFO] - [train] Step 12225 out of 20000 | Loss --> 1.861 | Grad_l2 --> 0.201 | Weights_l2 --> 11265.552 | Lr --> 0.005 | Seconds_per_step --> 4.859 | [2024-08-31 12:42:23,739][Main][INFO] - [train] Step 12250 out of 20000 | Loss --> 1.861 | Grad_l2 --> 0.203 | Weights_l2 --> 11265.717 | Lr --> 0.005 | Seconds_per_step --> 4.857 | [2024-08-31 12:44:25,326][Main][INFO] - [train] Step 12275 out of 20000 | Loss --> 1.866 | Grad_l2 --> 0.200 | Weights_l2 --> 11265.860 | Lr --> 0.005 | Seconds_per_step --> 4.863 | [2024-08-31 12:46:28,287][Main][INFO] - [train] Step 12300 out of 20000 | Loss --> 1.846 | Grad_l2 --> 0.199 | Weights_l2 --> 11266.008 | Lr --> 0.005 | Seconds_per_step --> 4.918 | [2024-08-31 12:48:30,050][Main][INFO] - [train] Step 12325 out of 20000 | Loss --> 1.861 | Grad_l2 --> 0.200 | Weights_l2 --> 11266.151 | Lr --> 0.005 | Seconds_per_step --> 4.870 | [2024-08-31 12:50:31,681][Main][INFO] - [train] Step 12350 out of 20000 | Loss --> 1.869 | Grad_l2 --> 0.203 | Weights_l2 --> 11266.302 | Lr --> 0.005 | Seconds_per_step --> 4.865 | [2024-08-31 12:52:34,812][Main][INFO] - [train] Step 12375 out of 20000 | Loss --> 1.852 | Grad_l2 --> 0.202 | Weights_l2 --> 11266.433 | Lr --> 0.005 | Seconds_per_step --> 4.925 | [2024-08-31 12:54:36,418][Main][INFO] - [train] Step 12400 out of 20000 | Loss --> 1.853 | Grad_l2 --> 0.200 | Weights_l2 --> 11266.574 | Lr --> 0.005 | Seconds_per_step --> 4.864 | [2024-08-31 12:56:38,564][Main][INFO] - [train] Step 12425 out of 20000 | Loss --> 1.857 | Grad_l2 --> 0.199 | Weights_l2 --> 11266.707 | Lr --> 0.005 | Seconds_per_step --> 4.886 | [2024-08-31 12:58:41,786][Main][INFO] - [train] Step 12450 out of 20000 | Loss --> 1.853 | Grad_l2 --> 0.200 | Weights_l2 --> 11266.822 | Lr --> 0.005 | Seconds_per_step --> 4.929 | [2024-08-31 13:00:43,266][Main][INFO] - [train] Step 12475 out of 20000 | Loss --> 1.843 | Grad_l2 --> 0.200 | Weights_l2 --> 11266.969 | Lr --> 0.005 | Seconds_per_step --> 4.859 | [2024-08-31 13:02:45,015][Main][INFO] - [train] Step 12500 out of 20000 | Loss --> 1.862 | Grad_l2 --> 0.200 | Weights_l2 --> 11267.117 | Lr --> 0.005 | Seconds_per_step --> 4.870 | [2024-08-31 13:02:45,016][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-12500 [2024-08-31 13:02:45,023][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-31 13:02:52,611][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-12500/model.safetensors [2024-08-31 13:03:02,165][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-12500/optimizer.bin [2024-08-31 13:03:02,167][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-12500/scheduler.bin [2024-08-31 13:03:02,168][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-12500/sampler.bin [2024-08-31 13:03:02,168][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-12500/sampler_1.bin [2024-08-31 13:03:02,170][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-12500/random_states_0.pkl [2024-08-31 13:05:05,109][Main][INFO] - [train] Step 12525 out of 20000 | Loss --> 1.854 | Grad_l2 --> 0.205 | Weights_l2 --> 11267.235 | Lr --> 0.005 | Seconds_per_step --> 5.604 | [2024-08-31 13:07:06,726][Main][INFO] - [train] Step 12550 out of 20000 | Loss --> 1.857 | Grad_l2 --> 0.202 | Weights_l2 --> 11267.361 | Lr --> 0.005 | Seconds_per_step --> 4.865 | [2024-08-31 13:09:08,396][Main][INFO] - [train] Step 12575 out of 20000 | Loss --> 1.843 | Grad_l2 --> 0.200 | Weights_l2 --> 11267.490 | Lr --> 0.005 | Seconds_per_step --> 4.867 | [2024-08-31 13:11:12,957][Main][INFO] - [train] Step 12600 out of 20000 | Loss --> 1.842 | Grad_l2 --> 0.199 | Weights_l2 --> 11267.604 | Lr --> 0.005 | Seconds_per_step --> 4.982 | [2024-08-31 13:13:14,864][Main][INFO] - [train] Step 12625 out of 20000 | Loss --> 1.846 | Grad_l2 --> 0.201 | Weights_l2 --> 11267.733 | Lr --> 0.005 | Seconds_per_step --> 4.876 | [2024-08-31 13:15:16,523][Main][INFO] - [train] Step 12650 out of 20000 | Loss --> 1.840 | Grad_l2 --> 0.199 | Weights_l2 --> 11267.853 | Lr --> 0.005 | Seconds_per_step --> 4.866 | [2024-08-31 13:17:19,569][Main][INFO] - [train] Step 12675 out of 20000 | Loss --> 1.841 | Grad_l2 --> 0.201 | Weights_l2 --> 11267.976 | Lr --> 0.005 | Seconds_per_step --> 4.922 | [2024-08-31 13:19:21,241][Main][INFO] - [train] Step 12700 out of 20000 | Loss --> 1.853 | Grad_l2 --> 0.200 | Weights_l2 --> 11268.100 | Lr --> 0.005 | Seconds_per_step --> 4.867 | [2024-08-31 13:21:22,899][Main][INFO] - [train] Step 12725 out of 20000 | Loss --> 1.847 | Grad_l2 --> 0.198 | Weights_l2 --> 11268.192 | Lr --> 0.005 | Seconds_per_step --> 4.866 | [2024-08-31 13:23:24,623][Main][INFO] - [train] Step 12750 out of 20000 | Loss --> 1.847 | Grad_l2 --> 0.201 | Weights_l2 --> 11268.309 | Lr --> 0.005 | Seconds_per_step --> 4.869 | [2024-08-31 13:25:27,641][Main][INFO] - [train] Step 12775 out of 20000 | Loss --> 1.843 | Grad_l2 --> 0.199 | Weights_l2 --> 11268.418 | Lr --> 0.005 | Seconds_per_step --> 4.921 | [2024-08-31 13:27:29,458][Main][INFO] - [train] Step 12800 out of 20000 | Loss --> 1.840 | Grad_l2 --> 0.200 | Weights_l2 --> 11268.524 | Lr --> 0.005 | Seconds_per_step --> 4.873 | [2024-08-31 13:29:31,221][Main][INFO] - [train] Step 12825 out of 20000 | Loss --> 1.834 | Grad_l2 --> 0.201 | Weights_l2 --> 11268.620 | Lr --> 0.005 | Seconds_per_step --> 4.870 | [2024-08-31 13:31:34,420][Main][INFO] - [train] Step 12850 out of 20000 | Loss --> 1.839 | Grad_l2 --> 0.197 | Weights_l2 --> 11268.724 | Lr --> 0.005 | Seconds_per_step --> 4.928 | [2024-08-31 13:33:36,046][Main][INFO] - [train] Step 12875 out of 20000 | Loss --> 1.842 | Grad_l2 --> 0.199 | Weights_l2 --> 11268.825 | Lr --> 0.005 | Seconds_per_step --> 4.865 | [2024-08-31 13:35:37,654][Main][INFO] - [train] Step 12900 out of 20000 | Loss --> 1.840 | Grad_l2 --> 0.203 | Weights_l2 --> 11268.936 | Lr --> 0.005 | Seconds_per_step --> 4.864 | [2024-08-31 13:37:40,879][Main][INFO] - [train] Step 12925 out of 20000 | Loss --> 1.842 | Grad_l2 --> 0.201 | Weights_l2 --> 11269.034 | Lr --> 0.005 | Seconds_per_step --> 4.929 | [2024-08-31 13:39:42,634][Main][INFO] - [train] Step 12950 out of 20000 | Loss --> 1.840 | Grad_l2 --> 0.200 | Weights_l2 --> 11269.130 | Lr --> 0.005 | Seconds_per_step --> 4.870 | [2024-08-31 13:41:44,422][Main][INFO] - [train] Step 12975 out of 20000 | Loss --> 1.840 | Grad_l2 --> 0.205 | Weights_l2 --> 11269.221 | Lr --> 0.005 | Seconds_per_step --> 4.871 | [2024-08-31 13:43:47,666][Main][INFO] - [train] Step 13000 out of 20000 | Loss --> 1.833 | Grad_l2 --> 0.200 | Weights_l2 --> 11269.310 | Lr --> 0.004 | Seconds_per_step --> 4.930 | [2024-08-31 13:45:49,381][Main][INFO] - [train] Step 13025 out of 20000 | Loss --> 1.833 | Grad_l2 --> 0.202 | Weights_l2 --> 11269.409 | Lr --> 0.004 | Seconds_per_step --> 4.869 | [2024-08-31 13:47:50,919][Main][INFO] - [train] Step 13050 out of 20000 | Loss --> 1.831 | Grad_l2 --> 0.202 | Weights_l2 --> 11269.506 | Lr --> 0.004 | Seconds_per_step --> 4.861 | [2024-08-31 13:49:54,013][Main][INFO] - [train] Step 13075 out of 20000 | Loss --> 1.817 | Grad_l2 --> 0.197 | Weights_l2 --> 11269.578 | Lr --> 0.004 | Seconds_per_step --> 4.924 | [2024-08-31 13:51:55,733][Main][INFO] - [train] Step 13100 out of 20000 | Loss --> 1.833 | Grad_l2 --> 0.198 | Weights_l2 --> 11269.664 | Lr --> 0.004 | Seconds_per_step --> 4.869 | [2024-08-31 13:53:57,429][Main][INFO] - [train] Step 13125 out of 20000 | Loss --> 1.841 | Grad_l2 --> 0.201 | Weights_l2 --> 11269.760 | Lr --> 0.004 | Seconds_per_step --> 4.868 | [2024-08-31 13:56:00,695][Main][INFO] - [train] Step 13150 out of 20000 | Loss --> 1.832 | Grad_l2 --> 0.201 | Weights_l2 --> 11269.839 | Lr --> 0.004 | Seconds_per_step --> 4.931 | [2024-08-31 13:58:02,278][Main][INFO] - [train] Step 13175 out of 20000 | Loss --> 1.833 | Grad_l2 --> 0.198 | Weights_l2 --> 11269.916 | Lr --> 0.004 | Seconds_per_step --> 4.863 | [2024-08-31 14:00:03,967][Main][INFO] - [train] Step 13200 out of 20000 | Loss --> 1.839 | Grad_l2 --> 0.197 | Weights_l2 --> 11269.997 | Lr --> 0.004 | Seconds_per_step --> 4.867 | [2024-08-31 14:02:05,673][Main][INFO] - [train] Step 13225 out of 20000 | Loss --> 1.838 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.075 | Lr --> 0.004 | Seconds_per_step --> 4.868 | [2024-08-31 14:04:09,449][Main][INFO] - [train] Step 13250 out of 20000 | Loss --> 1.831 | Grad_l2 --> 0.200 | Weights_l2 --> 11270.153 | Lr --> 0.004 | Seconds_per_step --> 4.951 | [2024-08-31 14:06:10,992][Main][INFO] - [train] Step 13275 out of 20000 | Loss --> 1.827 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.239 | Lr --> 0.004 | Seconds_per_step --> 4.862 | [2024-08-31 14:08:12,811][Main][INFO] - [train] Step 13300 out of 20000 | Loss --> 1.826 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.305 | Lr --> 0.004 | Seconds_per_step --> 4.873 | [2024-08-31 14:10:16,220][Main][INFO] - [train] Step 13325 out of 20000 | Loss --> 1.832 | Grad_l2 --> 0.200 | Weights_l2 --> 11270.387 | Lr --> 0.004 | Seconds_per_step --> 4.936 | [2024-08-31 14:12:18,200][Main][INFO] - [train] Step 13350 out of 20000 | Loss --> 1.831 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.460 | Lr --> 0.004 | Seconds_per_step --> 4.879 | [2024-08-31 14:14:20,142][Main][INFO] - [train] Step 13375 out of 20000 | Loss --> 1.835 | Grad_l2 --> 0.200 | Weights_l2 --> 11270.525 | Lr --> 0.004 | Seconds_per_step --> 4.878 | [2024-08-31 14:16:23,080][Main][INFO] - [train] Step 13400 out of 20000 | Loss --> 1.818 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.585 | Lr --> 0.004 | Seconds_per_step --> 4.917 | [2024-08-31 14:18:24,632][Main][INFO] - [train] Step 13425 out of 20000 | Loss --> 1.840 | Grad_l2 --> 0.201 | Weights_l2 --> 11270.665 | Lr --> 0.004 | Seconds_per_step --> 4.862 | [2024-08-31 14:20:26,249][Main][INFO] - [train] Step 13450 out of 20000 | Loss --> 1.825 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.716 | Lr --> 0.004 | Seconds_per_step --> 4.865 | [2024-08-31 14:22:29,448][Main][INFO] - [train] Step 13475 out of 20000 | Loss --> 1.823 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.789 | Lr --> 0.004 | Seconds_per_step --> 4.928 | [2024-08-31 14:24:31,223][Main][INFO] - [train] Step 13500 out of 20000 | Loss --> 1.823 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.827 | Lr --> 0.004 | Seconds_per_step --> 4.871 | [2024-08-31 14:26:32,982][Main][INFO] - [train] Step 13525 out of 20000 | Loss --> 1.832 | Grad_l2 --> 0.200 | Weights_l2 --> 11270.896 | Lr --> 0.004 | Seconds_per_step --> 4.870 | [2024-08-31 14:28:36,208][Main][INFO] - [train] Step 13550 out of 20000 | Loss --> 1.829 | Grad_l2 --> 0.202 | Weights_l2 --> 11270.960 | Lr --> 0.004 | Seconds_per_step --> 4.929 | [2024-08-31 14:30:37,886][Main][INFO] - [train] Step 13575 out of 20000 | Loss --> 1.821 | Grad_l2 --> 0.201 | Weights_l2 --> 11271.013 | Lr --> 0.004 | Seconds_per_step --> 4.867 | [2024-08-31 14:32:39,401][Main][INFO] - [train] Step 13600 out of 20000 | Loss --> 1.820 | Grad_l2 --> 0.195 | Weights_l2 --> 11271.069 | Lr --> 0.004 | Seconds_per_step --> 4.860 | [2024-08-31 14:34:42,768][Main][INFO] - [train] Step 13625 out of 20000 | Loss --> 1.825 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.126 | Lr --> 0.004 | Seconds_per_step --> 4.935 | [2024-08-31 14:36:44,344][Main][INFO] - [train] Step 13650 out of 20000 | Loss --> 1.816 | Grad_l2 --> 0.200 | Weights_l2 --> 11271.186 | Lr --> 0.004 | Seconds_per_step --> 4.863 | [2024-08-31 14:38:45,969][Main][INFO] - [train] Step 13675 out of 20000 | Loss --> 1.819 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.245 | Lr --> 0.004 | Seconds_per_step --> 4.865 | [2024-08-31 14:40:47,649][Main][INFO] - [train] Step 13700 out of 20000 | Loss --> 1.827 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.287 | Lr --> 0.004 | Seconds_per_step --> 4.867 | [2024-08-31 14:42:50,728][Main][INFO] - [train] Step 13725 out of 20000 | Loss --> 1.825 | Grad_l2 --> 0.202 | Weights_l2 --> 11271.333 | Lr --> 0.004 | Seconds_per_step --> 4.923 | [2024-08-31 14:44:52,286][Main][INFO] - [train] Step 13750 out of 20000 | Loss --> 1.818 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.387 | Lr --> 0.004 | Seconds_per_step --> 4.862 | [2024-08-31 14:46:53,961][Main][INFO] - [train] Step 13775 out of 20000 | Loss --> 1.825 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.435 | Lr --> 0.004 | Seconds_per_step --> 4.867 | [2024-08-31 14:48:57,481][Main][INFO] - [train] Step 13800 out of 20000 | Loss --> 1.814 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.467 | Lr --> 0.004 | Seconds_per_step --> 4.941 | [2024-08-31 14:50:59,087][Main][INFO] - [train] Step 13825 out of 20000 | Loss --> 1.812 | Grad_l2 --> 0.203 | Weights_l2 --> 11271.525 | Lr --> 0.004 | Seconds_per_step --> 4.864 | [2024-08-31 14:53:00,819][Main][INFO] - [train] Step 13850 out of 20000 | Loss --> 1.805 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.566 | Lr --> 0.004 | Seconds_per_step --> 4.869 | [2024-08-31 14:55:04,260][Main][INFO] - [train] Step 13875 out of 20000 | Loss --> 1.800 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.613 | Lr --> 0.004 | Seconds_per_step --> 4.938 | [2024-08-31 14:57:06,035][Main][INFO] - [train] Step 13900 out of 20000 | Loss --> 1.817 | Grad_l2 --> 0.202 | Weights_l2 --> 11271.658 | Lr --> 0.004 | Seconds_per_step --> 4.871 | [2024-08-31 14:59:07,735][Main][INFO] - [train] Step 13925 out of 20000 | Loss --> 1.813 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.691 | Lr --> 0.004 | Seconds_per_step --> 4.868 | [2024-08-31 15:01:10,804][Main][INFO] - [train] Step 13950 out of 20000 | Loss --> 1.814 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.729 | Lr --> 0.004 | Seconds_per_step --> 4.923 | [2024-08-31 15:03:12,518][Main][INFO] - [train] Step 13975 out of 20000 | Loss --> 1.807 | Grad_l2 --> 0.203 | Weights_l2 --> 11271.768 | Lr --> 0.003 | Seconds_per_step --> 4.868 | [2024-08-31 15:05:14,065][Main][INFO] - [train] Step 14000 out of 20000 | Loss --> 1.830 | Grad_l2 --> 0.201 | Weights_l2 --> 11271.802 | Lr --> 0.003 | Seconds_per_step --> 4.862 | [2024-08-31 15:07:17,296][Main][INFO] - [train] Step 14025 out of 20000 | Loss --> 1.806 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.855 | Lr --> 0.003 | Seconds_per_step --> 4.929 | [2024-08-31 15:09:19,178][Main][INFO] - [train] Step 14050 out of 20000 | Loss --> 1.803 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.896 | Lr --> 0.003 | Seconds_per_step --> 4.875 | [2024-08-31 15:11:20,981][Main][INFO] - [train] Step 14075 out of 20000 | Loss --> 1.801 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.926 | Lr --> 0.003 | Seconds_per_step --> 4.872 | [2024-08-31 15:13:22,831][Main][INFO] - [train] Step 14100 out of 20000 | Loss --> 1.815 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.954 | Lr --> 0.003 | Seconds_per_step --> 4.874 | [2024-08-31 15:15:26,181][Main][INFO] - [train] Step 14125 out of 20000 | Loss --> 1.804 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.983 | Lr --> 0.003 | Seconds_per_step --> 4.934 | [2024-08-31 15:17:27,954][Main][INFO] - [train] Step 14150 out of 20000 | Loss --> 1.815 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.023 | Lr --> 0.003 | Seconds_per_step --> 4.871 | [2024-08-31 15:19:29,837][Main][INFO] - [train] Step 14175 out of 20000 | Loss --> 1.810 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.059 | Lr --> 0.003 | Seconds_per_step --> 4.875 | [2024-08-31 15:21:33,066][Main][INFO] - [train] Step 14200 out of 20000 | Loss --> 1.803 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.082 | Lr --> 0.003 | Seconds_per_step --> 4.929 | [2024-08-31 15:23:34,757][Main][INFO] - [train] Step 14225 out of 20000 | Loss --> 1.811 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.111 | Lr --> 0.003 | Seconds_per_step --> 4.868 | [2024-08-31 15:25:36,501][Main][INFO] - [train] Step 14250 out of 20000 | Loss --> 1.806 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.133 | Lr --> 0.003 | Seconds_per_step --> 4.870 | [2024-08-31 15:27:39,888][Main][INFO] - [train] Step 14275 out of 20000 | Loss --> 1.797 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.157 | Lr --> 0.003 | Seconds_per_step --> 4.935 | [2024-08-31 15:29:41,445][Main][INFO] - [train] Step 14300 out of 20000 | Loss --> 1.806 | Grad_l2 --> 0.201 | Weights_l2 --> 11272.188 | Lr --> 0.003 | Seconds_per_step --> 4.862 | [2024-08-31 15:31:43,160][Main][INFO] - [train] Step 14325 out of 20000 | Loss --> 1.792 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.208 | Lr --> 0.003 | Seconds_per_step --> 4.869 | [2024-08-31 15:33:46,458][Main][INFO] - [train] Step 14350 out of 20000 | Loss --> 1.805 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.226 | Lr --> 0.003 | Seconds_per_step --> 4.932 | [2024-08-31 15:35:48,149][Main][INFO] - [train] Step 14375 out of 20000 | Loss --> 1.794 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.236 | Lr --> 0.003 | Seconds_per_step --> 4.868 | [2024-08-31 15:37:50,043][Main][INFO] - [train] Step 14400 out of 20000 | Loss --> 1.788 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.252 | Lr --> 0.003 | Seconds_per_step --> 4.876 | [2024-08-31 15:39:53,218][Main][INFO] - [train] Step 14425 out of 20000 | Loss --> 1.798 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.273 | Lr --> 0.003 | Seconds_per_step --> 4.927 | [2024-08-31 15:41:54,954][Main][INFO] - [train] Step 14450 out of 20000 | Loss --> 1.795 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.285 | Lr --> 0.003 | Seconds_per_step --> 4.869 | [2024-08-31 15:43:56,819][Main][INFO] - [train] Step 14475 out of 20000 | Loss --> 1.801 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.309 | Lr --> 0.003 | Seconds_per_step --> 4.874 | [2024-08-31 15:46:00,146][Main][INFO] - [train] Step 14500 out of 20000 | Loss --> 1.804 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.310 | Lr --> 0.003 | Seconds_per_step --> 4.933 | [2024-08-31 15:48:01,726][Main][INFO] - [train] Step 14525 out of 20000 | Loss --> 1.799 | Grad_l2 --> 0.201 | Weights_l2 --> 11272.321 | Lr --> 0.003 | Seconds_per_step --> 4.863 | [2024-08-31 15:50:03,576][Main][INFO] - [train] Step 14550 out of 20000 | Loss --> 1.798 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.333 | Lr --> 0.003 | Seconds_per_step --> 4.874 | [2024-08-31 15:52:05,506][Main][INFO] - [train] Step 14575 out of 20000 | Loss --> 1.784 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.344 | Lr --> 0.003 | Seconds_per_step --> 4.877 | [2024-08-31 15:54:08,843][Main][INFO] - [train] Step 14600 out of 20000 | Loss --> 1.800 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.352 | Lr --> 0.003 | Seconds_per_step --> 4.933 | [2024-08-31 15:56:10,616][Main][INFO] - [train] Step 14625 out of 20000 | Loss --> 1.791 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.373 | Lr --> 0.003 | Seconds_per_step --> 4.871 | [2024-08-31 15:58:12,333][Main][INFO] - [train] Step 14650 out of 20000 | Loss --> 1.790 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.382 | Lr --> 0.003 | Seconds_per_step --> 4.869 | [2024-08-31 16:00:15,580][Main][INFO] - [train] Step 14675 out of 20000 | Loss --> 1.792 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.384 | Lr --> 0.003 | Seconds_per_step --> 4.930 | [2024-08-31 16:02:17,498][Main][INFO] - [train] Step 14700 out of 20000 | Loss --> 1.794 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.395 | Lr --> 0.003 | Seconds_per_step --> 4.877 | [2024-08-31 16:04:19,351][Main][INFO] - [train] Step 14725 out of 20000 | Loss --> 1.795 | Grad_l2 --> 0.201 | Weights_l2 --> 11272.397 | Lr --> 0.003 | Seconds_per_step --> 4.874 | [2024-08-31 16:06:22,584][Main][INFO] - [train] Step 14750 out of 20000 | Loss --> 1.793 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.426 | Lr --> 0.003 | Seconds_per_step --> 4.929 | [2024-08-31 16:08:24,291][Main][INFO] - [train] Step 14775 out of 20000 | Loss --> 1.789 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.426 | Lr --> 0.003 | Seconds_per_step --> 4.868 | [2024-08-31 16:10:25,837][Main][INFO] - [train] Step 14800 out of 20000 | Loss --> 1.786 | Grad_l2 --> 0.201 | Weights_l2 --> 11272.442 | Lr --> 0.003 | Seconds_per_step --> 4.862 | [2024-08-31 16:12:28,904][Main][INFO] - [train] Step 14825 out of 20000 | Loss --> 1.778 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.448 | Lr --> 0.003 | Seconds_per_step --> 4.923 | [2024-08-31 16:14:30,571][Main][INFO] - [train] Step 14850 out of 20000 | Loss --> 1.802 | Grad_l2 --> 0.195 | Weights_l2 --> 11272.449 | Lr --> 0.003 | Seconds_per_step --> 4.867 | [2024-08-31 16:16:32,345][Main][INFO] - [train] Step 14875 out of 20000 | Loss --> 1.788 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.455 | Lr --> 0.003 | Seconds_per_step --> 4.871 | [2024-08-31 16:18:35,609][Main][INFO] - [train] Step 14900 out of 20000 | Loss --> 1.796 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.453 | Lr --> 0.003 | Seconds_per_step --> 4.930 | [2024-08-31 16:20:37,275][Main][INFO] - [train] Step 14925 out of 20000 | Loss --> 1.784 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.459 | Lr --> 0.003 | Seconds_per_step --> 4.867 | [2024-08-31 16:22:38,507][Main][INFO] - [train] Step 14950 out of 20000 | Loss --> 1.782 | Grad_l2 --> 0.195 | Weights_l2 --> 11272.466 | Lr --> 0.003 | Seconds_per_step --> 4.849 | [2024-08-31 16:24:41,651][Main][INFO] - [train] Step 14975 out of 20000 | Loss --> 1.780 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.464 | Lr --> 0.003 | Seconds_per_step --> 4.926 | [2024-08-31 16:26:43,515][Main][INFO] - [train] Step 15000 out of 20000 | Loss --> 1.784 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.456 | Lr --> 0.003 | Seconds_per_step --> 4.874 | [2024-08-31 16:26:43,516][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-15000 [2024-08-31 16:26:43,523][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-31 16:26:51,495][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-15000/model.safetensors [2024-08-31 16:27:00,844][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-15000/optimizer.bin [2024-08-31 16:27:00,846][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-15000/scheduler.bin [2024-08-31 16:27:00,846][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-15000/sampler.bin [2024-08-31 16:27:00,847][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-15000/sampler_1.bin [2024-08-31 16:27:00,848][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-15000/random_states_0.pkl [2024-08-31 16:29:02,515][Main][INFO] - [train] Step 15025 out of 20000 | Loss --> 1.776 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.459 | Lr --> 0.002 | Seconds_per_step --> 5.560 | [2024-08-31 16:31:04,558][Main][INFO] - [train] Step 15050 out of 20000 | Loss --> 1.786 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.459 | Lr --> 0.002 | Seconds_per_step --> 4.882 | [2024-08-31 16:33:07,938][Main][INFO] - [train] Step 15075 out of 20000 | Loss --> 1.766 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.457 | Lr --> 0.002 | Seconds_per_step --> 4.935 | [2024-08-31 16:35:10,104][Main][INFO] - [train] Step 15100 out of 20000 | Loss --> 1.787 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.457 | Lr --> 0.002 | Seconds_per_step --> 4.887 | [2024-08-31 16:37:12,233][Main][INFO] - [train] Step 15125 out of 20000 | Loss --> 1.778 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.453 | Lr --> 0.002 | Seconds_per_step --> 4.885 | [2024-08-31 16:39:25,389][Main][INFO] - [train] Step 15150 out of 20000 | Loss --> 1.775 | Grad_l2 --> 0.195 | Weights_l2 --> 11272.450 | Lr --> 0.002 | Seconds_per_step --> 5.326 | [2024-08-31 16:41:27,294][Main][INFO] - [train] Step 15175 out of 20000 | Loss --> 1.763 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.449 | Lr --> 0.002 | Seconds_per_step --> 4.876 | [2024-08-31 16:43:29,395][Main][INFO] - [train] Step 15200 out of 20000 | Loss --> 1.772 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.450 | Lr --> 0.002 | Seconds_per_step --> 4.884 | [2024-08-31 16:45:33,585][Main][INFO] - [train] Step 15225 out of 20000 | Loss --> 1.770 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.445 | Lr --> 0.002 | Seconds_per_step --> 4.967 | [2024-08-31 16:47:35,330][Main][INFO] - [train] Step 15250 out of 20000 | Loss --> 1.766 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.436 | Lr --> 0.002 | Seconds_per_step --> 4.870 | [2024-08-31 16:49:36,965][Main][INFO] - [train] Step 15275 out of 20000 | Loss --> 1.784 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.436 | Lr --> 0.002 | Seconds_per_step --> 4.865 | [2024-08-31 16:51:40,421][Main][INFO] - [train] Step 15300 out of 20000 | Loss --> 1.775 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.436 | Lr --> 0.002 | Seconds_per_step --> 4.938 | [2024-08-31 16:53:42,293][Main][INFO] - [train] Step 15325 out of 20000 | Loss --> 1.772 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.427 | Lr --> 0.002 | Seconds_per_step --> 4.875 | [2024-08-31 16:55:44,207][Main][INFO] - [train] Step 15350 out of 20000 | Loss --> 1.765 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.418 | Lr --> 0.002 | Seconds_per_step --> 4.876 | [2024-08-31 16:57:47,902][Main][INFO] - [train] Step 15375 out of 20000 | Loss --> 1.777 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.413 | Lr --> 0.002 | Seconds_per_step --> 4.948 | [2024-08-31 16:59:49,534][Main][INFO] - [train] Step 15400 out of 20000 | Loss --> 1.777 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.402 | Lr --> 0.002 | Seconds_per_step --> 4.865 | [2024-08-31 17:01:51,247][Main][INFO] - [train] Step 15425 out of 20000 | Loss --> 1.772 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.391 | Lr --> 0.002 | Seconds_per_step --> 4.868 | [2024-08-31 17:03:53,130][Main][INFO] - [train] Step 15450 out of 20000 | Loss --> 1.780 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.394 | Lr --> 0.002 | Seconds_per_step --> 4.875 | [2024-08-31 17:05:56,534][Main][INFO] - [train] Step 15475 out of 20000 | Loss --> 1.777 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.386 | Lr --> 0.002 | Seconds_per_step --> 4.936 | [2024-08-31 17:07:58,425][Main][INFO] - [train] Step 15500 out of 20000 | Loss --> 1.773 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.378 | Lr --> 0.002 | Seconds_per_step --> 4.876 | [2024-08-31 17:10:00,358][Main][INFO] - [train] Step 15525 out of 20000 | Loss --> 1.778 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.364 | Lr --> 0.002 | Seconds_per_step --> 4.877 | [2024-08-31 17:12:03,531][Main][INFO] - [train] Step 15550 out of 20000 | Loss --> 1.773 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.349 | Lr --> 0.002 | Seconds_per_step --> 4.927 | [2024-08-31 17:14:05,478][Main][INFO] - [train] Step 15575 out of 20000 | Loss --> 1.778 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.340 | Lr --> 0.002 | Seconds_per_step --> 4.878 | [2024-08-31 17:16:07,219][Main][INFO] - [train] Step 15600 out of 20000 | Loss --> 1.780 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.323 | Lr --> 0.002 | Seconds_per_step --> 4.870 | [2024-08-31 17:18:10,558][Main][INFO] - [train] Step 15625 out of 20000 | Loss --> 1.772 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.311 | Lr --> 0.002 | Seconds_per_step --> 4.933 | [2024-08-31 17:20:12,172][Main][INFO] - [train] Step 15650 out of 20000 | Loss --> 1.780 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.303 | Lr --> 0.002 | Seconds_per_step --> 4.864 | [2024-08-31 17:22:13,878][Main][INFO] - [train] Step 15675 out of 20000 | Loss --> 1.767 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.290 | Lr --> 0.002 | Seconds_per_step --> 4.868 | [2024-08-31 17:24:17,279][Main][INFO] - [train] Step 15700 out of 20000 | Loss --> 1.773 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.273 | Lr --> 0.002 | Seconds_per_step --> 4.936 | [2024-08-31 17:26:19,126][Main][INFO] - [train] Step 15725 out of 20000 | Loss --> 1.780 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.263 | Lr --> 0.002 | Seconds_per_step --> 4.874 | [2024-08-31 17:28:20,908][Main][INFO] - [train] Step 15750 out of 20000 | Loss --> 1.782 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.246 | Lr --> 0.002 | Seconds_per_step --> 4.871 | [2024-08-31 17:30:24,238][Main][INFO] - [train] Step 15775 out of 20000 | Loss --> 1.768 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.233 | Lr --> 0.002 | Seconds_per_step --> 4.933 | [2024-08-31 17:32:26,169][Main][INFO] - [train] Step 15800 out of 20000 | Loss --> 1.760 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.220 | Lr --> 0.002 | Seconds_per_step --> 4.877 | [2024-08-31 17:34:28,677][Main][INFO] - [train] Step 15825 out of 20000 | Loss --> 1.766 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.207 | Lr --> 0.002 | Seconds_per_step --> 4.900 | [2024-08-31 17:36:32,092][Main][INFO] - [train] Step 15850 out of 20000 | Loss --> 1.781 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.196 | Lr --> 0.002 | Seconds_per_step --> 4.936 | [2024-08-31 17:38:33,746][Main][INFO] - [train] Step 15875 out of 20000 | Loss --> 1.763 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.177 | Lr --> 0.002 | Seconds_per_step --> 4.866 | [2024-08-31 17:40:35,415][Main][INFO] - [train] Step 15900 out of 20000 | Loss --> 1.770 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.165 | Lr --> 0.002 | Seconds_per_step --> 4.867 | [2024-08-31 17:42:36,852][Main][INFO] - [train] Step 15925 out of 20000 | Loss --> 1.766 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.154 | Lr --> 0.002 | Seconds_per_step --> 4.857 | [2024-08-31 17:44:40,125][Main][INFO] - [train] Step 15950 out of 20000 | Loss --> 1.774 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.141 | Lr --> 0.002 | Seconds_per_step --> 4.931 | [2024-08-31 17:46:42,051][Main][INFO] - [train] Step 15975 out of 20000 | Loss --> 1.761 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.121 | Lr --> 0.002 | Seconds_per_step --> 4.877 | [2024-08-31 17:48:43,831][Main][INFO] - [train] Step 16000 out of 20000 | Loss --> 1.775 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.110 | Lr --> 0.002 | Seconds_per_step --> 4.871 | [2024-08-31 17:50:47,287][Main][INFO] - [train] Step 16025 out of 20000 | Loss --> 1.768 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.098 | Lr --> 0.002 | Seconds_per_step --> 4.938 | [2024-08-31 17:52:48,992][Main][INFO] - [train] Step 16050 out of 20000 | Loss --> 1.770 | Grad_l2 --> 0.202 | Weights_l2 --> 11272.084 | Lr --> 0.002 | Seconds_per_step --> 4.868 | [2024-08-31 17:54:50,724][Main][INFO] - [train] Step 16075 out of 20000 | Loss --> 1.767 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.068 | Lr --> 0.002 | Seconds_per_step --> 4.869 | [2024-08-31 17:56:54,016][Main][INFO] - [train] Step 16100 out of 20000 | Loss --> 1.777 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.054 | Lr --> 0.002 | Seconds_per_step --> 4.932 | [2024-08-31 17:58:55,942][Main][INFO] - [train] Step 16125 out of 20000 | Loss --> 1.768 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.037 | Lr --> 0.002 | Seconds_per_step --> 4.877 | [2024-08-31 18:00:57,478][Main][INFO] - [train] Step 16150 out of 20000 | Loss --> 1.766 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.020 | Lr --> 0.002 | Seconds_per_step --> 4.861 | [2024-08-31 18:03:00,736][Main][INFO] - [train] Step 16175 out of 20000 | Loss --> 1.767 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.000 | Lr --> 0.002 | Seconds_per_step --> 4.930 | [2024-08-31 18:05:02,586][Main][INFO] - [train] Step 16200 out of 20000 | Loss --> 1.760 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.988 | Lr --> 0.002 | Seconds_per_step --> 4.874 | [2024-08-31 18:07:04,438][Main][INFO] - [train] Step 16225 out of 20000 | Loss --> 1.758 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.975 | Lr --> 0.002 | Seconds_per_step --> 4.874 | [2024-08-31 18:09:07,901][Main][INFO] - [train] Step 16250 out of 20000 | Loss --> 1.763 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.962 | Lr --> 0.001 | Seconds_per_step --> 4.938 | [2024-08-31 18:11:09,649][Main][INFO] - [train] Step 16275 out of 20000 | Loss --> 1.749 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.952 | Lr --> 0.001 | Seconds_per_step --> 4.870 | [2024-08-31 18:13:11,767][Main][INFO] - [train] Step 16300 out of 20000 | Loss --> 1.754 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.935 | Lr --> 0.001 | Seconds_per_step --> 4.885 | [2024-08-31 18:15:13,819][Main][INFO] - [train] Step 16325 out of 20000 | Loss --> 1.765 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.928 | Lr --> 0.001 | Seconds_per_step --> 4.882 | [2024-08-31 18:17:17,096][Main][INFO] - [train] Step 16350 out of 20000 | Loss --> 1.749 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.916 | Lr --> 0.001 | Seconds_per_step --> 4.931 | [2024-08-31 18:19:18,879][Main][INFO] - [train] Step 16375 out of 20000 | Loss --> 1.757 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.902 | Lr --> 0.001 | Seconds_per_step --> 4.871 | [2024-08-31 18:21:20,566][Main][INFO] - [train] Step 16400 out of 20000 | Loss --> 1.770 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.886 | Lr --> 0.001 | Seconds_per_step --> 4.867 | [2024-08-31 18:23:24,550][Main][INFO] - [train] Step 16425 out of 20000 | Loss --> 1.752 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.874 | Lr --> 0.001 | Seconds_per_step --> 4.959 | [2024-08-31 18:25:26,461][Main][INFO] - [train] Step 16450 out of 20000 | Loss --> 1.756 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.860 | Lr --> 0.001 | Seconds_per_step --> 4.876 | [2024-08-31 18:27:28,278][Main][INFO] - [train] Step 16475 out of 20000 | Loss --> 1.754 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.846 | Lr --> 0.001 | Seconds_per_step --> 4.873 | [2024-08-31 18:29:32,921][Main][INFO] - [train] Step 16500 out of 20000 | Loss --> 1.753 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.834 | Lr --> 0.001 | Seconds_per_step --> 4.986 | [2024-08-31 18:31:35,149][Main][INFO] - [train] Step 16525 out of 20000 | Loss --> 1.760 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.819 | Lr --> 0.001 | Seconds_per_step --> 4.889 | [2024-08-31 18:33:37,364][Main][INFO] - [train] Step 16550 out of 20000 | Loss --> 1.754 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.799 | Lr --> 0.001 | Seconds_per_step --> 4.889 | [2024-08-31 18:35:40,915][Main][INFO] - [train] Step 16575 out of 20000 | Loss --> 1.751 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.786 | Lr --> 0.001 | Seconds_per_step --> 4.942 | [2024-08-31 18:37:43,466][Main][INFO] - [train] Step 16600 out of 20000 | Loss --> 1.768 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.770 | Lr --> 0.001 | Seconds_per_step --> 4.902 | [2024-08-31 18:39:45,637][Main][INFO] - [train] Step 16625 out of 20000 | Loss --> 1.749 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.754 | Lr --> 0.001 | Seconds_per_step --> 4.887 | [2024-08-31 18:41:49,405][Main][INFO] - [train] Step 16650 out of 20000 | Loss --> 1.748 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.739 | Lr --> 0.001 | Seconds_per_step --> 4.951 | [2024-08-31 18:43:51,102][Main][INFO] - [train] Step 16675 out of 20000 | Loss --> 1.745 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.723 | Lr --> 0.001 | Seconds_per_step --> 4.868 | [2024-08-31 18:45:52,944][Main][INFO] - [train] Step 16700 out of 20000 | Loss --> 1.736 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.708 | Lr --> 0.001 | Seconds_per_step --> 4.874 | [2024-08-31 18:47:56,099][Main][INFO] - [train] Step 16725 out of 20000 | Loss --> 1.757 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.698 | Lr --> 0.001 | Seconds_per_step --> 4.926 | [2024-08-31 18:49:57,801][Main][INFO] - [train] Step 16750 out of 20000 | Loss --> 1.742 | Grad_l2 --> 0.195 | Weights_l2 --> 11271.684 | Lr --> 0.001 | Seconds_per_step --> 4.868 | [2024-08-31 18:51:59,437][Main][INFO] - [train] Step 16775 out of 20000 | Loss --> 1.755 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.665 | Lr --> 0.001 | Seconds_per_step --> 4.865 | [2024-08-31 18:54:00,855][Main][INFO] - [train] Step 16800 out of 20000 | Loss --> 1.747 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.651 | Lr --> 0.001 | Seconds_per_step --> 4.857 | [2024-08-31 18:56:03,853][Main][INFO] - [train] Step 16825 out of 20000 | Loss --> 1.743 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.632 | Lr --> 0.001 | Seconds_per_step --> 4.920 | [2024-08-31 18:58:05,201][Main][INFO] - [train] Step 16850 out of 20000 | Loss --> 1.745 | Grad_l2 --> 0.201 | Weights_l2 --> 11271.619 | Lr --> 0.001 | Seconds_per_step --> 4.854 | [2024-08-31 19:00:06,563][Main][INFO] - [train] Step 16875 out of 20000 | Loss --> 1.753 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.606 | Lr --> 0.001 | Seconds_per_step --> 4.854 | [2024-08-31 19:02:09,317][Main][INFO] - [train] Step 16900 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.591 | Lr --> 0.001 | Seconds_per_step --> 4.910 | [2024-08-31 19:04:10,472][Main][INFO] - [train] Step 16925 out of 20000 | Loss --> 1.729 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.575 | Lr --> 0.001 | Seconds_per_step --> 4.846 | [2024-08-31 19:06:11,583][Main][INFO] - [train] Step 16950 out of 20000 | Loss --> 1.741 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.559 | Lr --> 0.001 | Seconds_per_step --> 4.844 | [2024-08-31 19:08:14,363][Main][INFO] - [train] Step 16975 out of 20000 | Loss --> 1.729 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.545 | Lr --> 0.001 | Seconds_per_step --> 4.911 | [2024-08-31 19:10:15,178][Main][INFO] - [train] Step 17000 out of 20000 | Loss --> 1.738 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.533 | Lr --> 0.001 | Seconds_per_step --> 4.832 | [2024-08-31 19:12:16,110][Main][INFO] - [train] Step 17025 out of 20000 | Loss --> 1.741 | Grad_l2 --> 0.200 | Weights_l2 --> 11271.515 | Lr --> 0.001 | Seconds_per_step --> 4.837 | [2024-08-31 19:14:18,572][Main][INFO] - [train] Step 17050 out of 20000 | Loss --> 1.740 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.500 | Lr --> 0.001 | Seconds_per_step --> 4.898 | [2024-08-31 19:16:19,446][Main][INFO] - [train] Step 17075 out of 20000 | Loss --> 1.735 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.485 | Lr --> 0.001 | Seconds_per_step --> 4.835 | [2024-08-31 19:18:20,449][Main][INFO] - [train] Step 17100 out of 20000 | Loss --> 1.734 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.471 | Lr --> 0.001 | Seconds_per_step --> 4.840 | [2024-08-31 19:20:23,025][Main][INFO] - [train] Step 17125 out of 20000 | Loss --> 1.736 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.453 | Lr --> 0.001 | Seconds_per_step --> 4.903 | [2024-08-31 19:22:23,946][Main][INFO] - [train] Step 17150 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.439 | Lr --> 0.001 | Seconds_per_step --> 4.837 | [2024-08-31 19:24:25,361][Main][INFO] - [train] Step 17175 out of 20000 | Loss --> 1.732 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.422 | Lr --> 0.001 | Seconds_per_step --> 4.857 | [2024-08-31 19:26:26,446][Main][INFO] - [train] Step 17200 out of 20000 | Loss --> 1.734 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.408 | Lr --> 0.001 | Seconds_per_step --> 4.843 | [2024-08-31 19:28:29,313][Main][INFO] - [train] Step 17225 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.395 | Lr --> 0.001 | Seconds_per_step --> 4.915 | [2024-08-31 19:30:30,859][Main][INFO] - [train] Step 17250 out of 20000 | Loss --> 1.737 | Grad_l2 --> 0.195 | Weights_l2 --> 11271.382 | Lr --> 0.001 | Seconds_per_step --> 4.862 | [2024-08-31 19:32:32,558][Main][INFO] - [train] Step 17275 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.367 | Lr --> 0.001 | Seconds_per_step --> 4.868 | [2024-08-31 19:34:35,555][Main][INFO] - [train] Step 17300 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.349 | Lr --> 0.001 | Seconds_per_step --> 4.920 | [2024-08-31 19:36:37,051][Main][INFO] - [train] Step 17325 out of 20000 | Loss --> 1.722 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.333 | Lr --> 0.001 | Seconds_per_step --> 4.860 | [2024-08-31 19:38:38,163][Main][INFO] - [train] Step 17350 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.315 | Lr --> 0.001 | Seconds_per_step --> 4.844 | [2024-08-31 19:40:41,121][Main][INFO] - [train] Step 17375 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.299 | Lr --> 0.001 | Seconds_per_step --> 4.918 | [2024-08-31 19:42:42,654][Main][INFO] - [train] Step 17400 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.201 | Weights_l2 --> 11271.281 | Lr --> 0.001 | Seconds_per_step --> 4.861 | [2024-08-31 19:44:44,126][Main][INFO] - [train] Step 17425 out of 20000 | Loss --> 1.713 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.264 | Lr --> 0.001 | Seconds_per_step --> 4.859 | [2024-08-31 19:46:47,266][Main][INFO] - [train] Step 17450 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.200 | Weights_l2 --> 11271.245 | Lr --> 0.001 | Seconds_per_step --> 4.925 | [2024-08-31 19:48:48,692][Main][INFO] - [train] Step 17475 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.227 | Lr --> 0.001 | Seconds_per_step --> 4.857 | [2024-08-31 19:50:50,523][Main][INFO] - [train] Step 17500 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.212 | Lr --> 0.001 | Seconds_per_step --> 4.873 | [2024-08-31 19:50:50,523][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-17500 [2024-08-31 19:50:50,530][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-31 19:50:57,172][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-17500/model.safetensors [2024-08-31 19:51:06,254][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-17500/optimizer.bin [2024-08-31 19:51:06,257][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-17500/scheduler.bin [2024-08-31 19:51:06,258][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-17500/sampler.bin [2024-08-31 19:51:06,260][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-17500/sampler_1.bin [2024-08-31 19:51:06,261][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-17500/random_states_0.pkl [2024-08-31 19:53:08,757][Main][INFO] - [train] Step 17525 out of 20000 | Loss --> 1.716 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.193 | Lr --> 0.001 | Seconds_per_step --> 5.529 | [2024-08-31 19:55:09,923][Main][INFO] - [train] Step 17550 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.173 | Lr --> 0.001 | Seconds_per_step --> 4.847 | [2024-08-31 19:57:11,243][Main][INFO] - [train] Step 17575 out of 20000 | Loss --> 1.700 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.155 | Lr --> 0.001 | Seconds_per_step --> 4.853 | [2024-08-31 19:59:14,099][Main][INFO] - [train] Step 17600 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.137 | Lr --> 0.001 | Seconds_per_step --> 4.914 | [2024-08-31 20:01:15,562][Main][INFO] - [train] Step 17625 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.118 | Lr --> 0.001 | Seconds_per_step --> 4.858 | [2024-08-31 20:03:16,470][Main][INFO] - [train] Step 17650 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.097 | Lr --> 0.001 | Seconds_per_step --> 4.836 | [2024-08-31 20:05:17,916][Main][INFO] - [train] Step 17675 out of 20000 | Loss --> 1.733 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.078 | Lr --> 0.001 | Seconds_per_step --> 4.858 | [2024-08-31 20:07:20,683][Main][INFO] - [train] Step 17700 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.062 | Lr --> 0.001 | Seconds_per_step --> 4.911 | [2024-08-31 20:09:22,414][Main][INFO] - [train] Step 17725 out of 20000 | Loss --> 1.707 | Grad_l2 --> 0.195 | Weights_l2 --> 11271.041 | Lr --> 0.001 | Seconds_per_step --> 4.869 | [2024-08-31 20:11:24,033][Main][INFO] - [train] Step 17750 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.024 | Lr --> 0.001 | Seconds_per_step --> 4.865 | [2024-08-31 20:13:26,602][Main][INFO] - [train] Step 17775 out of 20000 | Loss --> 1.730 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.007 | Lr --> 0.001 | Seconds_per_step --> 4.903 | [2024-08-31 20:15:27,607][Main][INFO] - [train] Step 17800 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.991 | Lr --> 0.001 | Seconds_per_step --> 4.840 | [2024-08-31 20:17:28,616][Main][INFO] - [train] Step 17825 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.976 | Lr --> 0.001 | Seconds_per_step --> 4.840 | [2024-08-31 20:19:31,033][Main][INFO] - [train] Step 17850 out of 20000 | Loss --> 1.729 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.961 | Lr --> 0.001 | Seconds_per_step --> 4.897 | [2024-08-31 20:21:32,133][Main][INFO] - [train] Step 17875 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.946 | Lr --> 0.001 | Seconds_per_step --> 4.844 | [2024-08-31 20:23:33,151][Main][INFO] - [train] Step 17900 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.933 | Lr --> 0.000 | Seconds_per_step --> 4.841 | [2024-08-31 20:25:35,828][Main][INFO] - [train] Step 17925 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.921 | Lr --> 0.000 | Seconds_per_step --> 4.907 | [2024-08-31 20:27:36,892][Main][INFO] - [train] Step 17950 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.911 | Lr --> 0.000 | Seconds_per_step --> 4.842 | [2024-08-31 20:29:38,066][Main][INFO] - [train] Step 17975 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.900 | Lr --> 0.000 | Seconds_per_step --> 4.847 | [2024-08-31 20:31:40,569][Main][INFO] - [train] Step 18000 out of 20000 | Loss --> 1.738 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.891 | Lr --> 0.000 | Seconds_per_step --> 4.900 | [2024-08-31 20:33:41,408][Main][INFO] - [train] Step 18025 out of 20000 | Loss --> 1.739 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.884 | Lr --> 0.000 | Seconds_per_step --> 4.833 | [2024-08-31 20:35:42,352][Main][INFO] - [train] Step 18050 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.876 | Lr --> 0.000 | Seconds_per_step --> 4.838 | [2024-08-31 20:37:45,322][Main][INFO] - [train] Step 18075 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.869 | Lr --> 0.000 | Seconds_per_step --> 4.919 | [2024-08-31 20:39:46,981][Main][INFO] - [train] Step 18100 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.862 | Lr --> 0.000 | Seconds_per_step --> 4.866 | [2024-08-31 20:41:48,584][Main][INFO] - [train] Step 18125 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.855 | Lr --> 0.000 | Seconds_per_step --> 4.864 | [2024-08-31 20:43:49,907][Main][INFO] - [train] Step 18150 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.853 | [2024-08-31 20:45:52,968][Main][INFO] - [train] Step 18175 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.844 | Lr --> 0.000 | Seconds_per_step --> 4.922 | [2024-08-31 20:47:54,325][Main][INFO] - [train] Step 18200 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.854 | [2024-08-31 20:49:55,663][Main][INFO] - [train] Step 18225 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.195 | Weights_l2 --> 11270.840 | Lr --> 0.000 | Seconds_per_step --> 4.853 | [2024-08-31 20:51:58,657][Main][INFO] - [train] Step 18250 out of 20000 | Loss --> 1.718 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.836 | Lr --> 0.000 | Seconds_per_step --> 4.920 | [2024-08-31 20:54:00,083][Main][INFO] - [train] Step 18275 out of 20000 | Loss --> 1.721 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.834 | Lr --> 0.000 | Seconds_per_step --> 4.857 | [2024-08-31 20:56:01,850][Main][INFO] - [train] Step 18300 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.831 | Lr --> 0.000 | Seconds_per_step --> 4.871 | [2024-08-31 20:58:04,690][Main][INFO] - [train] Step 18325 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.832 | Lr --> 0.000 | Seconds_per_step --> 4.913 | [2024-08-31 21:00:06,226][Main][INFO] - [train] Step 18350 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.832 | Lr --> 0.000 | Seconds_per_step --> 4.861 | [2024-08-31 21:02:07,970][Main][INFO] - [train] Step 18375 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.831 | Lr --> 0.000 | Seconds_per_step --> 4.870 | [2024-08-31 21:04:11,035][Main][INFO] - [train] Step 18400 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.831 | Lr --> 0.000 | Seconds_per_step --> 4.922 | [2024-08-31 21:06:12,731][Main][INFO] - [train] Step 18425 out of 20000 | Loss --> 1.714 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.834 | Lr --> 0.000 | Seconds_per_step --> 4.868 | [2024-08-31 21:08:14,292][Main][INFO] - [train] Step 18450 out of 20000 | Loss --> 1.721 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.836 | Lr --> 0.000 | Seconds_per_step --> 4.862 | [2024-08-31 21:10:17,481][Main][INFO] - [train] Step 18475 out of 20000 | Loss --> 1.714 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.837 | Lr --> 0.000 | Seconds_per_step --> 4.927 | [2024-08-31 21:12:19,115][Main][INFO] - [train] Step 18500 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.838 | Lr --> 0.000 | Seconds_per_step --> 4.865 | [2024-08-31 21:14:20,604][Main][INFO] - [train] Step 18525 out of 20000 | Loss --> 1.714 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.839 | Lr --> 0.000 | Seconds_per_step --> 4.859 | [2024-08-31 21:16:24,832][Main][INFO] - [train] Step 18550 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.969 | [2024-08-31 21:18:26,217][Main][INFO] - [train] Step 18575 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.855 | [2024-08-31 21:20:27,684][Main][INFO] - [train] Step 18600 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.840 | Lr --> 0.000 | Seconds_per_step --> 4.859 | [2024-08-31 21:22:29,530][Main][INFO] - [train] Step 18625 out of 20000 | Loss --> 1.730 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.874 | [2024-08-31 21:24:33,115][Main][INFO] - [train] Step 18650 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.195 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.943 | [2024-08-31 21:26:35,121][Main][INFO] - [train] Step 18675 out of 20000 | Loss --> 1.732 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.842 | Lr --> 0.000 | Seconds_per_step --> 4.880 | [2024-08-31 21:28:37,128][Main][INFO] - [train] Step 18700 out of 20000 | Loss --> 1.722 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.843 | Lr --> 0.000 | Seconds_per_step --> 4.880 | [2024-08-31 21:30:40,396][Main][INFO] - [train] Step 18725 out of 20000 | Loss --> 1.712 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.843 | Lr --> 0.000 | Seconds_per_step --> 4.931 | [2024-08-31 21:33:03,172][Main][INFO] - [train] Step 18750 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.844 | Lr --> 0.000 | Seconds_per_step --> 5.711 | [2024-08-31 21:35:04,721][Main][INFO] - [train] Step 18775 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.845 | Lr --> 0.000 | Seconds_per_step --> 4.862 | [2024-08-31 21:37:08,128][Main][INFO] - [train] Step 18800 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.845 | Lr --> 0.000 | Seconds_per_step --> 4.936 | [2024-08-31 21:39:09,857][Main][INFO] - [train] Step 18825 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.201 | Weights_l2 --> 11270.845 | Lr --> 0.000 | Seconds_per_step --> 4.869 | [2024-08-31 21:41:11,700][Main][INFO] - [train] Step 18850 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.847 | Lr --> 0.000 | Seconds_per_step --> 4.874 | [2024-08-31 21:43:15,117][Main][INFO] - [train] Step 18875 out of 20000 | Loss --> 1.722 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.846 | Lr --> 0.000 | Seconds_per_step --> 4.937 | [2024-08-31 21:45:19,433][Main][INFO] - [train] Step 18900 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.847 | Lr --> 0.000 | Seconds_per_step --> 4.973 | [2024-08-31 21:47:29,032][Main][INFO] - [train] Step 18925 out of 20000 | Loss --> 1.709 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.847 | Lr --> 0.000 | Seconds_per_step --> 5.184 | [2024-08-31 21:49:33,512][Main][INFO] - [train] Step 18950 out of 20000 | Loss --> 1.731 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.848 | Lr --> 0.000 | Seconds_per_step --> 4.979 | [2024-08-31 21:51:35,196][Main][INFO] - [train] Step 18975 out of 20000 | Loss --> 1.721 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.867 | [2024-08-31 21:53:36,788][Main][INFO] - [train] Step 19000 out of 20000 | Loss --> 1.718 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.864 | [2024-08-31 21:55:38,313][Main][INFO] - [train] Step 19025 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.861 | [2024-08-31 21:57:41,329][Main][INFO] - [train] Step 19050 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.921 | [2024-08-31 21:59:42,853][Main][INFO] - [train] Step 19075 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.861 | [2024-08-31 22:01:44,492][Main][INFO] - [train] Step 19100 out of 20000 | Loss --> 1.730 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.865 | [2024-08-31 22:03:47,660][Main][INFO] - [train] Step 19125 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.927 | [2024-08-31 22:05:49,133][Main][INFO] - [train] Step 19150 out of 20000 | Loss --> 1.712 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.859 | [2024-08-31 22:07:50,623][Main][INFO] - [train] Step 19175 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.860 | [2024-08-31 22:09:53,873][Main][INFO] - [train] Step 19200 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.930 | [2024-08-31 22:11:55,529][Main][INFO] - [train] Step 19225 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.866 | [2024-08-31 22:13:57,272][Main][INFO] - [train] Step 19250 out of 20000 | Loss --> 1.730 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.870 | [2024-08-31 22:16:01,229][Main][INFO] - [train] Step 19275 out of 20000 | Loss --> 1.722 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.958 | [2024-08-31 22:18:03,766][Main][INFO] - [train] Step 19300 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.901 | [2024-08-31 22:20:06,053][Main][INFO] - [train] Step 19325 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.891 | [2024-08-31 22:22:09,832][Main][INFO] - [train] Step 19350 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.200 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.951 | [2024-08-31 22:24:11,539][Main][INFO] - [train] Step 19375 out of 20000 | Loss --> 1.707 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.868 | [2024-08-31 22:26:13,254][Main][INFO] - [train] Step 19400 out of 20000 | Loss --> 1.718 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.869 | [2024-08-31 22:28:15,089][Main][INFO] - [train] Step 19425 out of 20000 | Loss --> 1.711 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.873 | [2024-08-31 22:30:18,617][Main][INFO] - [train] Step 19450 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.941 | [2024-08-31 22:32:20,460][Main][INFO] - [train] Step 19475 out of 20000 | Loss --> 1.711 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.874 | [2024-08-31 22:34:21,991][Main][INFO] - [train] Step 19500 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.861 | [2024-08-31 22:36:24,959][Main][INFO] - [train] Step 19525 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.919 | [2024-08-31 22:38:26,479][Main][INFO] - [train] Step 19550 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.861 | [2024-08-31 22:40:28,432][Main][INFO] - [train] Step 19575 out of 20000 | Loss --> 1.714 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.878 | [2024-08-31 22:42:31,962][Main][INFO] - [train] Step 19600 out of 20000 | Loss --> 1.718 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.941 | [2024-08-31 22:44:33,970][Main][INFO] - [train] Step 19625 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.880 | [2024-08-31 22:46:38,990][Main][INFO] - [train] Step 19650 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 5.001 | [2024-08-31 22:48:42,541][Main][INFO] - [train] Step 19675 out of 20000 | Loss --> 1.711 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.942 | [2024-08-31 22:50:48,499][Main][INFO] - [train] Step 19700 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 5.038 | [2024-08-31 22:52:50,152][Main][INFO] - [train] Step 19725 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.866 | [2024-08-31 22:54:53,252][Main][INFO] - [train] Step 19750 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.924 | [2024-08-31 22:56:55,118][Main][INFO] - [train] Step 19775 out of 20000 | Loss --> 1.712 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.875 | [2024-08-31 22:58:56,863][Main][INFO] - [train] Step 19800 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.870 | [2024-08-31 23:01:01,950][Main][INFO] - [train] Step 19825 out of 20000 | Loss --> 1.710 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 5.003 | [2024-08-31 23:03:04,913][Main][INFO] - [train] Step 19850 out of 20000 | Loss --> 1.713 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.918 | [2024-08-31 23:05:06,946][Main][INFO] - [train] Step 19875 out of 20000 | Loss --> 1.710 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.881 | [2024-08-31 23:07:08,902][Main][INFO] - [train] Step 19900 out of 20000 | Loss --> 1.711 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.878 | [2024-08-31 23:09:12,065][Main][INFO] - [train] Step 19925 out of 20000 | Loss --> 1.716 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.926 | [2024-08-31 23:11:13,586][Main][INFO] - [train] Step 19950 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.861 | [2024-08-31 23:13:15,435][Main][INFO] - [train] Step 19975 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.874 | [2024-08-31 23:15:18,590][Main][INFO] - [train] Step 20000 out of 20000 | Loss --> 1.716 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.926 | [2024-08-31 23:15:18,591][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-20000 [2024-08-31 23:15:18,599][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-31 23:15:26,324][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-20000/model.safetensors [2024-08-31 23:15:35,439][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-20000/optimizer.bin [2024-08-31 23:15:35,440][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-20000/scheduler.bin [2024-08-31 23:15:35,441][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-20000/sampler.bin [2024-08-31 23:15:35,441][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-20000/sampler_1.bin [2024-08-31 23:15:35,442][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-20000/random_states_0.pkl [2024-08-31 23:15:39,524][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 16 (max is dataset.n_shards=8). Stopping 8 dataloader workers. [2024-08-31 23:31:42,282][Main][INFO] - [eval] Step 20001 out of 20000 | Loss --> 2.073 | Accuracy --> 0.604 | Time --> 964.275 | [2024-08-31 23:31:42,287][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-20001 [2024-08-31 23:31:42,295][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-31 23:31:50,975][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-20001/model.safetensors [2024-08-31 23:32:00,717][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-20001/optimizer.bin [2024-08-31 23:32:00,719][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-20001/scheduler.bin [2024-08-31 23:32:00,720][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-20001/sampler.bin [2024-08-31 23:32:00,720][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-20001/sampler_1.bin [2024-08-31 23:32:00,721][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-20001/random_states_0.pkl