File size: 39,411 Bytes
ca671b7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
[2024-10-20 18:25:17,510][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[2024-10-20 18:25:17,521][Main][INFO] - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: bf16
[2024-10-20 18:25:17,522][Main][INFO] - Working directory is /workspace/nanoT5/logs/2024-10-20/18-25-17
[2024-10-20 18:31:35,111][Main][INFO] - [train] Step 25 out of 65536 | Loss --> 155.837 | Loss_ntp --> 76.275 | Loss_mlm --> 79.561 | Grad_l2 --> 476.354 | Weights_l2 --> 7701.821 | Lr --> 0.001 | Seconds_per_step --> 14.044 |
[2024-10-20 18:35:35,171][Main][INFO] - [train] Step 50 out of 65536 | Loss --> 98.644 | Loss_ntp --> 48.540 | Loss_mlm --> 50.105 | Grad_l2 --> 234.932 | Weights_l2 --> 7701.813 | Lr --> 0.001 | Seconds_per_step --> 9.602 |
[2024-10-20 18:39:35,197][Main][INFO] - [train] Step 75 out of 65536 | Loss --> 86.994 | Loss_ntp --> 42.861 | Loss_mlm --> 44.133 | Grad_l2 --> 180.388 | Weights_l2 --> 7701.806 | Lr --> 0.001 | Seconds_per_step --> 9.601 |
[2024-10-20 18:43:35,733][Main][INFO] - [train] Step 100 out of 65536 | Loss --> 80.568 | Loss_ntp --> 39.806 | Loss_mlm --> 40.762 | Grad_l2 --> 156.732 | Weights_l2 --> 7701.800 | Lr --> 0.001 | Seconds_per_step --> 9.621 |
[2024-10-20 18:47:37,016][Main][INFO] - [train] Step 125 out of 65536 | Loss --> 77.131 | Loss_ntp --> 38.127 | Loss_mlm --> 39.004 | Grad_l2 --> 179.590 | Weights_l2 --> 7701.794 | Lr --> 0.001 | Seconds_per_step --> 9.651 |
[2024-10-20 18:51:38,437][Main][INFO] - [train] Step 150 out of 65536 | Loss --> 73.900 | Loss_ntp --> 36.620 | Loss_mlm --> 37.281 | Grad_l2 --> 161.591 | Weights_l2 --> 7701.789 | Lr --> 0.001 | Seconds_per_step --> 9.657 |
[2024-10-20 18:55:39,020][Main][INFO] - [train] Step 175 out of 65536 | Loss --> 72.118 | Loss_ntp --> 35.763 | Loss_mlm --> 36.355 | Grad_l2 --> 161.741 | Weights_l2 --> 7701.783 | Lr --> 0.001 | Seconds_per_step --> 9.623 |
[2024-10-20 18:59:40,344][Main][INFO] - [train] Step 200 out of 65536 | Loss --> 70.712 | Loss_ntp --> 35.041 | Loss_mlm --> 35.671 | Grad_l2 --> 154.736 | Weights_l2 --> 7701.778 | Lr --> 0.001 | Seconds_per_step --> 9.653 |
[2024-10-20 19:03:39,817][Main][INFO] - [train] Step 225 out of 65536 | Loss --> 69.050 | Loss_ntp --> 34.233 | Loss_mlm --> 34.817 | Grad_l2 --> 106.908 | Weights_l2 --> 7701.772 | Lr --> 0.001 | Seconds_per_step --> 9.579 |
[2024-10-20 19:07:41,876][Main][INFO] - [train] Step 250 out of 65536 | Loss --> 68.595 | Loss_ntp --> 33.970 | Loss_mlm --> 34.625 | Grad_l2 --> 126.557 | Weights_l2 --> 7701.767 | Lr --> 0.001 | Seconds_per_step --> 9.682 |
[2024-10-20 19:11:43,944][Main][INFO] - [train] Step 275 out of 65536 | Loss --> 67.141 | Loss_ntp --> 33.297 | Loss_mlm --> 33.844 | Grad_l2 --> 114.874 | Weights_l2 --> 7701.762 | Lr --> 0.001 | Seconds_per_step --> 9.683 |
[2024-10-20 19:15:43,786][Main][INFO] - [train] Step 300 out of 65536 | Loss --> 65.916 | Loss_ntp --> 32.693 | Loss_mlm --> 33.223 | Grad_l2 --> 89.430 | Weights_l2 --> 7701.757 | Lr --> 0.001 | Seconds_per_step --> 9.594 |
[2024-10-20 19:19:45,206][Main][INFO] - [train] Step 325 out of 65536 | Loss --> 65.322 | Loss_ntp --> 32.362 | Loss_mlm --> 32.960 | Grad_l2 --> 97.785 | Weights_l2 --> 7701.751 | Lr --> 0.001 | Seconds_per_step --> 9.657 |
[2024-10-20 19:23:45,072][Main][INFO] - [train] Step 350 out of 65536 | Loss --> 64.367 | Loss_ntp --> 31.937 | Loss_mlm --> 32.430 | Grad_l2 --> 83.882 | Weights_l2 --> 7701.746 | Lr --> 0.001 | Seconds_per_step --> 9.595 |
[2024-10-20 19:27:46,534][Main][INFO] - [train] Step 375 out of 65536 | Loss --> 63.409 | Loss_ntp --> 31.433 | Loss_mlm --> 31.975 | Grad_l2 --> 75.548 | Weights_l2 --> 7701.741 | Lr --> 0.001 | Seconds_per_step --> 9.658 |
[2024-10-20 19:31:45,390][Main][INFO] - [train] Step 400 out of 65536 | Loss --> 62.292 | Loss_ntp --> 30.925 | Loss_mlm --> 31.367 | Grad_l2 --> 72.299 | Weights_l2 --> 7701.736 | Lr --> 0.001 | Seconds_per_step --> 9.554 |
[2024-10-20 19:35:46,689][Main][INFO] - [train] Step 425 out of 65536 | Loss --> 61.685 | Loss_ntp --> 30.585 | Loss_mlm --> 31.100 | Grad_l2 --> 73.838 | Weights_l2 --> 7701.731 | Lr --> 0.001 | Seconds_per_step --> 9.652 |
[2024-10-20 19:39:46,030][Main][INFO] - [train] Step 450 out of 65536 | Loss --> 61.416 | Loss_ntp --> 30.509 | Loss_mlm --> 30.907 | Grad_l2 --> 79.820 | Weights_l2 --> 7701.726 | Lr --> 0.001 | Seconds_per_step --> 9.573 |
[2024-10-20 19:43:47,298][Main][INFO] - [train] Step 475 out of 65536 | Loss --> 60.536 | Loss_ntp --> 30.069 | Loss_mlm --> 30.467 | Grad_l2 --> 59.074 | Weights_l2 --> 7701.722 | Lr --> 0.001 | Seconds_per_step --> 9.651 |
[2024-10-20 19:47:48,778][Main][INFO] - [train] Step 500 out of 65536 | Loss --> 60.085 | Loss_ntp --> 29.838 | Loss_mlm --> 30.246 | Grad_l2 --> 71.417 | Weights_l2 --> 7701.717 | Lr --> 0.001 | Seconds_per_step --> 9.659 |
[2024-10-20 19:49:25,862][Main][INFO] - [eval] Step 500 out of 65536 | Loss --> 57.611 | Loss_ntp --> 28.694 | Loss_mlm --> 28.917 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 97.080 |
[2024-10-20 19:53:26,482][Main][INFO] - [train] Step 525 out of 65536 | Loss --> 59.106 | Loss_ntp --> 29.371 | Loss_mlm --> 29.735 | Grad_l2 --> 56.829 | Weights_l2 --> 7701.712 | Lr --> 0.001 | Seconds_per_step --> 9.625 |
[2024-10-20 19:57:25,811][Main][INFO] - [train] Step 550 out of 65536 | Loss --> 58.185 | Loss_ntp --> 28.950 | Loss_mlm --> 29.235 | Grad_l2 --> 56.368 | Weights_l2 --> 7701.707 | Lr --> 0.001 | Seconds_per_step --> 9.573 |
[2024-10-20 20:01:26,095][Main][INFO] - [train] Step 575 out of 65536 | Loss --> 57.301 | Loss_ntp --> 28.480 | Loss_mlm --> 28.821 | Grad_l2 --> 39.860 | Weights_l2 --> 7701.703 | Lr --> 0.001 | Seconds_per_step --> 9.611 |
[2024-10-20 20:05:26,649][Main][INFO] - [train] Step 600 out of 65536 | Loss --> 56.020 | Loss_ntp --> 27.906 | Loss_mlm --> 28.115 | Grad_l2 --> 35.414 | Weights_l2 --> 7701.698 | Lr --> 0.001 | Seconds_per_step --> 9.622 |
[2024-10-20 20:09:28,597][Main][INFO] - [train] Step 625 out of 65536 | Loss --> 55.363 | Loss_ntp --> 27.524 | Loss_mlm --> 27.840 | Grad_l2 --> 50.531 | Weights_l2 --> 7701.694 | Lr --> 0.001 | Seconds_per_step --> 9.678 |
[2024-10-20 20:13:29,399][Main][INFO] - [train] Step 650 out of 65536 | Loss --> 54.803 | Loss_ntp --> 27.252 | Loss_mlm --> 27.551 | Grad_l2 --> 56.108 | Weights_l2 --> 7701.689 | Lr --> 0.001 | Seconds_per_step --> 9.632 |
[2024-10-20 20:17:31,948][Main][INFO] - [train] Step 675 out of 65536 | Loss --> 53.970 | Loss_ntp --> 26.793 | Loss_mlm --> 27.176 | Grad_l2 --> 46.473 | Weights_l2 --> 7701.685 | Lr --> 0.001 | Seconds_per_step --> 9.702 |
[2024-10-20 20:21:31,196][Main][INFO] - [train] Step 700 out of 65536 | Loss --> 53.056 | Loss_ntp --> 26.359 | Loss_mlm --> 26.697 | Grad_l2 --> 37.435 | Weights_l2 --> 7701.680 | Lr --> 0.001 | Seconds_per_step --> 9.570 |
[2024-10-20 20:25:33,347][Main][INFO] - [train] Step 725 out of 65536 | Loss --> 52.070 | Loss_ntp --> 25.876 | Loss_mlm --> 26.194 | Grad_l2 --> 43.881 | Weights_l2 --> 7701.676 | Lr --> 0.001 | Seconds_per_step --> 9.686 |
[2024-10-20 20:29:33,004][Main][INFO] - [train] Step 750 out of 65536 | Loss --> 51.191 | Loss_ntp --> 25.456 | Loss_mlm --> 25.735 | Grad_l2 --> 44.855 | Weights_l2 --> 7701.672 | Lr --> 0.001 | Seconds_per_step --> 9.586 |
[2024-10-20 20:33:34,557][Main][INFO] - [train] Step 775 out of 65536 | Loss --> 50.129 | Loss_ntp --> 24.891 | Loss_mlm --> 25.239 | Grad_l2 --> 40.117 | Weights_l2 --> 7701.667 | Lr --> 0.001 | Seconds_per_step --> 9.662 |
[2024-10-20 20:37:33,242][Main][INFO] - [train] Step 800 out of 65536 | Loss --> 49.019 | Loss_ntp --> 24.361 | Loss_mlm --> 24.658 | Grad_l2 --> 39.953 | Weights_l2 --> 7701.663 | Lr --> 0.001 | Seconds_per_step --> 9.547 |
[2024-10-20 20:41:33,285][Main][INFO] - [train] Step 825 out of 65536 | Loss --> 48.160 | Loss_ntp --> 23.923 | Loss_mlm --> 24.238 | Grad_l2 --> 42.816 | Weights_l2 --> 7701.659 | Lr --> 0.001 | Seconds_per_step --> 9.602 |
[2024-10-20 20:45:34,352][Main][INFO] - [train] Step 850 out of 65536 | Loss --> 46.672 | Loss_ntp --> 23.149 | Loss_mlm --> 23.522 | Grad_l2 --> 42.230 | Weights_l2 --> 7701.654 | Lr --> 0.001 | Seconds_per_step --> 9.643 |
[2024-10-20 20:49:34,963][Main][INFO] - [train] Step 875 out of 65536 | Loss --> 44.855 | Loss_ntp --> 22.279 | Loss_mlm --> 22.575 | Grad_l2 --> 39.123 | Weights_l2 --> 7701.650 | Lr --> 0.001 | Seconds_per_step --> 9.624 |
[2024-10-20 20:53:36,677][Main][INFO] - [train] Step 900 out of 65536 | Loss --> 42.480 | Loss_ntp --> 21.057 | Loss_mlm --> 21.423 | Grad_l2 --> 50.501 | Weights_l2 --> 7701.645 | Lr --> 0.001 | Seconds_per_step --> 9.668 |
[2024-10-20 20:57:37,186][Main][INFO] - [train] Step 925 out of 65536 | Loss --> 40.028 | Loss_ntp --> 19.877 | Loss_mlm --> 20.151 | Grad_l2 --> 57.109 | Weights_l2 --> 7701.640 | Lr --> 0.001 | Seconds_per_step --> 9.620 |
[2024-10-20 21:01:38,800][Main][INFO] - [train] Step 950 out of 65536 | Loss --> 37.058 | Loss_ntp --> 18.359 | Loss_mlm --> 18.699 | Grad_l2 --> 78.443 | Weights_l2 --> 7701.634 | Lr --> 0.001 | Seconds_per_step --> 9.664 |
[2024-10-20 21:05:38,405][Main][INFO] - [train] Step 975 out of 65536 | Loss --> 33.534 | Loss_ntp --> 16.618 | Loss_mlm --> 16.917 | Grad_l2 --> 87.220 | Weights_l2 --> 7701.628 | Lr --> 0.001 | Seconds_per_step --> 9.584 |
[2024-10-20 21:09:41,153][Main][INFO] - [train] Step 1000 out of 65536 | Loss --> 29.988 | Loss_ntp --> 14.857 | Loss_mlm --> 15.131 | Grad_l2 --> 88.279 | Weights_l2 --> 7701.622 | Lr --> 0.001 | Seconds_per_step --> 9.710 |
[2024-10-20 21:10:10,310][Main][INFO] - [eval] Step 1000 out of 65536 | Loss --> 28.033 | Loss_ntp --> 13.938 | Loss_mlm --> 14.095 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 29.143 |
[2024-10-20 21:14:10,580][Main][INFO] - [train] Step 1025 out of 65536 | Loss --> 26.588 | Loss_ntp --> 13.166 | Loss_mlm --> 13.423 | Grad_l2 --> 109.226 | Weights_l2 --> 7701.616 | Lr --> 0.001 | Seconds_per_step --> 9.611 |
[2024-10-20 21:18:12,558][Main][INFO] - [train] Step 1050 out of 65536 | Loss --> 23.850 | Loss_ntp --> 11.830 | Loss_mlm --> 12.020 | Grad_l2 --> 98.666 | Weights_l2 --> 7701.610 | Lr --> 0.001 | Seconds_per_step --> 9.679 |
[2024-10-20 21:22:11,593][Main][INFO] - [train] Step 1075 out of 65536 | Loss --> 21.589 | Loss_ntp --> 10.697 | Loss_mlm --> 10.892 | Grad_l2 --> 104.858 | Weights_l2 --> 7701.605 | Lr --> 0.001 | Seconds_per_step --> 9.561 |
[2024-10-20 21:26:13,779][Main][INFO] - [train] Step 1100 out of 65536 | Loss --> 19.443 | Loss_ntp --> 9.626 | Loss_mlm --> 9.817 | Grad_l2 --> 75.473 | Weights_l2 --> 7701.599 | Lr --> 0.001 | Seconds_per_step --> 9.687 |
[2024-10-20 21:30:13,762][Main][INFO] - [train] Step 1125 out of 65536 | Loss --> 17.771 | Loss_ntp --> 8.793 | Loss_mlm --> 8.978 | Grad_l2 --> 55.492 | Weights_l2 --> 7701.593 | Lr --> 0.001 | Seconds_per_step --> 9.599 |
[2024-10-20 21:34:14,478][Main][INFO] - [train] Step 1150 out of 65536 | Loss --> 17.092 | Loss_ntp --> 8.462 | Loss_mlm --> 8.630 | Grad_l2 --> 72.673 | Weights_l2 --> 7701.587 | Lr --> 0.001 | Seconds_per_step --> 9.629 |
[2024-10-20 21:38:14,797][Main][INFO] - [train] Step 1175 out of 65536 | Loss --> 16.731 | Loss_ntp --> 8.294 | Loss_mlm --> 8.437 | Grad_l2 --> 60.718 | Weights_l2 --> 7701.582 | Lr --> 0.001 | Seconds_per_step --> 9.613 |
[2024-10-20 21:42:15,467][Main][INFO] - [train] Step 1200 out of 65536 | Loss --> 16.522 | Loss_ntp --> 8.188 | Loss_mlm --> 8.334 | Grad_l2 --> 62.414 | Weights_l2 --> 7701.577 | Lr --> 0.001 | Seconds_per_step --> 9.627 |
[2024-10-20 21:46:15,957][Main][INFO] - [train] Step 1225 out of 65536 | Loss --> 16.336 | Loss_ntp --> 8.096 | Loss_mlm --> 8.240 | Grad_l2 --> 57.944 | Weights_l2 --> 7701.572 | Lr --> 0.001 | Seconds_per_step --> 9.619 |
[2024-10-20 21:50:15,276][Main][INFO] - [train] Step 1250 out of 65536 | Loss --> 16.167 | Loss_ntp --> 8.006 | Loss_mlm --> 8.161 | Grad_l2 --> 42.899 | Weights_l2 --> 7701.567 | Lr --> 0.001 | Seconds_per_step --> 9.573 |
[2024-10-20 21:54:18,039][Main][INFO] - [train] Step 1275 out of 65536 | Loss --> 16.183 | Loss_ntp --> 8.017 | Loss_mlm --> 8.166 | Grad_l2 --> 48.492 | Weights_l2 --> 7701.563 | Lr --> 0.001 | Seconds_per_step --> 9.710 |
[2024-10-20 21:58:18,396][Main][INFO] - [train] Step 1300 out of 65536 | Loss --> 15.988 | Loss_ntp --> 7.926 | Loss_mlm --> 8.063 | Grad_l2 --> 42.852 | Weights_l2 --> 7701.558 | Lr --> 0.001 | Seconds_per_step --> 9.614 |
[2024-10-20 22:02:20,263][Main][INFO] - [train] Step 1325 out of 65536 | Loss --> 15.982 | Loss_ntp --> 7.916 | Loss_mlm --> 8.066 | Grad_l2 --> 47.218 | Weights_l2 --> 7701.553 | Lr --> 0.001 | Seconds_per_step --> 9.675 |
[2024-10-20 22:06:20,739][Main][INFO] - [train] Step 1350 out of 65536 | Loss --> 15.830 | Loss_ntp --> 7.838 | Loss_mlm --> 7.992 | Grad_l2 --> 28.805 | Weights_l2 --> 7701.549 | Lr --> 0.001 | Seconds_per_step --> 9.619 |
[2024-10-20 22:10:23,190][Main][INFO] - [train] Step 1375 out of 65536 | Loss --> 15.806 | Loss_ntp --> 7.839 | Loss_mlm --> 7.967 | Grad_l2 --> 37.388 | Weights_l2 --> 7701.544 | Lr --> 0.001 | Seconds_per_step --> 9.698 |
[2024-10-20 22:14:23,525][Main][INFO] - [train] Step 1400 out of 65536 | Loss --> 15.775 | Loss_ntp --> 7.813 | Loss_mlm --> 7.962 | Grad_l2 --> 35.380 | Weights_l2 --> 7701.540 | Lr --> 0.001 | Seconds_per_step --> 9.613 |
[2024-10-20 22:18:25,080][Main][INFO] - [train] Step 1425 out of 65536 | Loss --> 15.722 | Loss_ntp --> 7.794 | Loss_mlm --> 7.928 | Grad_l2 --> 34.978 | Weights_l2 --> 7701.535 | Lr --> 0.001 | Seconds_per_step --> 9.662 |
[2024-10-20 22:22:24,651][Main][INFO] - [train] Step 1450 out of 65536 | Loss --> 15.638 | Loss_ntp --> 7.739 | Loss_mlm --> 7.899 | Grad_l2 --> 24.003 | Weights_l2 --> 7701.530 | Lr --> 0.001 | Seconds_per_step --> 9.583 |
[2024-10-20 22:26:24,495][Main][INFO] - [train] Step 1475 out of 65536 | Loss --> 15.682 | Loss_ntp --> 7.768 | Loss_mlm --> 7.913 | Grad_l2 --> 27.599 | Weights_l2 --> 7701.526 | Lr --> 0.001 | Seconds_per_step --> 9.594 |
[2024-10-20 22:30:25,992][Main][INFO] - [train] Step 1500 out of 65536 | Loss --> 15.638 | Loss_ntp --> 7.754 | Loss_mlm --> 7.884 | Grad_l2 --> 22.985 | Weights_l2 --> 7701.521 | Lr --> 0.001 | Seconds_per_step --> 9.660 |
[2024-10-20 22:30:54,697][Main][INFO] - [eval] Step 1500 out of 65536 | Loss --> 15.664 | Loss_ntp --> 7.782 | Loss_mlm --> 7.882 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.700 |
[2024-10-20 22:30:54,709][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-1500
[2024-10-20 22:30:54,719][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
[2024-10-20 22:30:59,988][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-1500/model.safetensors
[2024-10-20 22:31:08,673][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-1500/optimizer.bin
[2024-10-20 22:31:08,682][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-1500/scheduler.bin
[2024-10-20 22:31:08,684][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-1500/sampler.bin
[2024-10-20 22:31:08,686][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-1500/sampler_1.bin
[2024-10-20 22:31:08,694][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-1500/random_states_0.pkl
[2024-10-20 22:35:09,885][Main][INFO] - [train] Step 1525 out of 65536 | Loss --> 15.740 | Loss_ntp --> 7.803 | Loss_mlm --> 7.937 | Grad_l2 --> 35.476 | Weights_l2 --> 7701.516 | Lr --> 0.001 | Seconds_per_step --> 10.207 |
[2024-10-20 22:39:10,189][Main][INFO] - [train] Step 1550 out of 65536 | Loss --> 15.717 | Loss_ntp --> 7.796 | Loss_mlm --> 7.921 | Grad_l2 --> 32.209 | Weights_l2 --> 7701.511 | Lr --> 0.001 | Seconds_per_step --> 9.612 |
[2024-10-20 22:43:12,020][Main][INFO] - [train] Step 1575 out of 65536 | Loss --> 15.723 | Loss_ntp --> 7.805 | Loss_mlm --> 7.918 | Grad_l2 --> 35.393 | Weights_l2 --> 7701.506 | Lr --> 0.001 | Seconds_per_step --> 9.673 |
[2024-10-20 22:47:13,492][Main][INFO] - [train] Step 1600 out of 65536 | Loss --> 15.617 | Loss_ntp --> 7.752 | Loss_mlm --> 7.865 | Grad_l2 --> 29.357 | Weights_l2 --> 7701.502 | Lr --> 0.001 | Seconds_per_step --> 9.659 |
[2024-10-20 22:51:13,978][Main][INFO] - [train] Step 1625 out of 65536 | Loss --> 15.532 | Loss_ntp --> 7.709 | Loss_mlm --> 7.822 | Grad_l2 --> 18.501 | Weights_l2 --> 7701.497 | Lr --> 0.001 | Seconds_per_step --> 9.619 |
[2024-10-20 22:55:14,600][Main][INFO] - [train] Step 1650 out of 65536 | Loss --> 15.565 | Loss_ntp --> 7.720 | Loss_mlm --> 7.845 | Grad_l2 --> 17.546 | Weights_l2 --> 7701.493 | Lr --> 0.001 | Seconds_per_step --> 9.625 |
[2024-10-20 22:59:14,384][Main][INFO] - [train] Step 1675 out of 65536 | Loss --> 15.576 | Loss_ntp --> 7.737 | Loss_mlm --> 7.838 | Grad_l2 --> 23.599 | Weights_l2 --> 7701.489 | Lr --> 0.001 | Seconds_per_step --> 9.591 |
[2024-10-20 23:03:16,878][Main][INFO] - [train] Step 1700 out of 65536 | Loss --> 15.612 | Loss_ntp --> 7.757 | Loss_mlm --> 7.855 | Grad_l2 --> 28.685 | Weights_l2 --> 7701.484 | Lr --> 0.001 | Seconds_per_step --> 9.700 |
[2024-10-20 23:07:16,611][Main][INFO] - [train] Step 1725 out of 65536 | Loss --> 15.590 | Loss_ntp --> 7.728 | Loss_mlm --> 7.861 | Grad_l2 --> 22.357 | Weights_l2 --> 7701.479 | Lr --> 0.001 | Seconds_per_step --> 9.589 |
[2024-10-20 23:11:18,435][Main][INFO] - [train] Step 1750 out of 65536 | Loss --> 15.475 | Loss_ntp --> 7.683 | Loss_mlm --> 7.792 | Grad_l2 --> 20.808 | Weights_l2 --> 7701.475 | Lr --> 0.001 | Seconds_per_step --> 9.673 |
[2024-10-20 23:15:17,324][Main][INFO] - [train] Step 1775 out of 65536 | Loss --> 15.422 | Loss_ntp --> 7.655 | Loss_mlm --> 7.767 | Grad_l2 --> 16.928 | Weights_l2 --> 7701.470 | Lr --> 0.001 | Seconds_per_step --> 9.555 |
[2024-10-20 23:19:17,823][Main][INFO] - [train] Step 1800 out of 65536 | Loss --> 15.370 | Loss_ntp --> 7.625 | Loss_mlm --> 7.745 | Grad_l2 --> 16.147 | Weights_l2 --> 7701.466 | Lr --> 0.001 | Seconds_per_step --> 9.620 |
[2024-10-20 23:23:19,005][Main][INFO] - [train] Step 1825 out of 65536 | Loss --> 15.363 | Loss_ntp --> 7.629 | Loss_mlm --> 7.734 | Grad_l2 --> 19.934 | Weights_l2 --> 7701.462 | Lr --> 0.001 | Seconds_per_step --> 9.647 |
[2024-10-20 23:27:17,933][Main][INFO] - [train] Step 1850 out of 65536 | Loss --> 15.347 | Loss_ntp --> 7.616 | Loss_mlm --> 7.732 | Grad_l2 --> 25.592 | Weights_l2 --> 7701.457 | Lr --> 0.001 | Seconds_per_step --> 9.557 |
[2024-10-20 23:31:19,805][Main][INFO] - [train] Step 1875 out of 65536 | Loss --> 15.254 | Loss_ntp --> 7.577 | Loss_mlm --> 7.677 | Grad_l2 --> 19.500 | Weights_l2 --> 7701.453 | Lr --> 0.001 | Seconds_per_step --> 9.675 |
[2024-10-20 23:35:18,582][Main][INFO] - [train] Step 1900 out of 65536 | Loss --> 15.204 | Loss_ntp --> 7.550 | Loss_mlm --> 7.653 | Grad_l2 --> 15.358 | Weights_l2 --> 7701.448 | Lr --> 0.001 | Seconds_per_step --> 9.551 |
[2024-10-20 23:39:20,300][Main][INFO] - [train] Step 1925 out of 65536 | Loss --> 15.153 | Loss_ntp --> 7.525 | Loss_mlm --> 7.628 | Grad_l2 --> 13.241 | Weights_l2 --> 7701.445 | Lr --> 0.001 | Seconds_per_step --> 9.669 |
[2024-10-20 23:43:21,680][Main][INFO] - [train] Step 1950 out of 65536 | Loss --> 15.111 | Loss_ntp --> 7.497 | Loss_mlm --> 7.614 | Grad_l2 --> 13.357 | Weights_l2 --> 7701.441 | Lr --> 0.001 | Seconds_per_step --> 9.655 |
[2024-10-20 23:47:22,111][Main][INFO] - [train] Step 1975 out of 65536 | Loss --> 15.072 | Loss_ntp --> 7.475 | Loss_mlm --> 7.597 | Grad_l2 --> 15.485 | Weights_l2 --> 7701.437 | Lr --> 0.001 | Seconds_per_step --> 9.617 |
[2024-10-20 23:51:21,960][Main][INFO] - [train] Step 2000 out of 65536 | Loss --> 15.061 | Loss_ntp --> 7.470 | Loss_mlm --> 7.591 | Grad_l2 --> 15.511 | Weights_l2 --> 7701.432 | Lr --> 0.001 | Seconds_per_step --> 9.594 |
[2024-10-20 23:51:50,849][Main][INFO] - [eval] Step 2000 out of 65536 | Loss --> 15.092 | Loss_ntp --> 7.501 | Loss_mlm --> 7.591 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.883 |
[2024-10-20 23:55:53,490][Main][INFO] - [train] Step 2025 out of 65536 | Loss --> 15.080 | Loss_ntp --> 7.479 | Loss_mlm --> 7.601 | Grad_l2 --> 17.451 | Weights_l2 --> 7701.428 | Lr --> 0.001 | Seconds_per_step --> 9.705 |
[2024-10-20 23:59:53,747][Main][INFO] - [train] Step 2050 out of 65536 | Loss --> 14.998 | Loss_ntp --> 7.447 | Loss_mlm --> 7.551 | Grad_l2 --> 13.242 | Weights_l2 --> 7701.424 | Lr --> 0.001 | Seconds_per_step --> 9.610 |
[2024-10-21 00:03:57,114][Main][INFO] - [train] Step 2075 out of 65536 | Loss --> 14.994 | Loss_ntp --> 7.431 | Loss_mlm --> 7.562 | Grad_l2 --> 17.409 | Weights_l2 --> 7701.419 | Lr --> 0.001 | Seconds_per_step --> 9.735 |
[2024-10-21 00:07:56,557][Main][INFO] - [train] Step 2100 out of 65536 | Loss --> 14.993 | Loss_ntp --> 7.437 | Loss_mlm --> 7.556 | Grad_l2 --> 23.374 | Weights_l2 --> 7701.414 | Lr --> 0.001 | Seconds_per_step --> 9.578 |
[2024-10-21 00:11:56,818][Main][INFO] - [train] Step 2125 out of 65536 | Loss --> 14.963 | Loss_ntp --> 7.428 | Loss_mlm --> 7.535 | Grad_l2 --> 24.857 | Weights_l2 --> 7701.410 | Lr --> 0.001 | Seconds_per_step --> 9.610 |
[2024-10-21 00:15:56,927][Main][INFO] - [train] Step 2150 out of 65536 | Loss --> 14.829 | Loss_ntp --> 7.354 | Loss_mlm --> 7.474 | Grad_l2 --> 14.538 | Weights_l2 --> 7701.405 | Lr --> 0.001 | Seconds_per_step --> 9.604 |
[2024-10-21 00:19:57,089][Main][INFO] - [train] Step 2175 out of 65536 | Loss --> 14.797 | Loss_ntp --> 7.344 | Loss_mlm --> 7.453 | Grad_l2 --> 13.598 | Weights_l2 --> 7701.400 | Lr --> 0.001 | Seconds_per_step --> 9.606 |
[2024-10-21 00:23:58,135][Main][INFO] - [train] Step 2200 out of 65536 | Loss --> 14.774 | Loss_ntp --> 7.321 | Loss_mlm --> 7.454 | Grad_l2 --> 13.339 | Weights_l2 --> 7701.396 | Lr --> 0.001 | Seconds_per_step --> 9.642 |
[2024-10-21 00:27:58,499][Main][INFO] - [train] Step 2225 out of 65536 | Loss --> 14.671 | Loss_ntp --> 7.284 | Loss_mlm --> 7.387 | Grad_l2 --> 13.884 | Weights_l2 --> 7701.392 | Lr --> 0.001 | Seconds_per_step --> 9.614 |
[2024-10-21 00:31:59,596][Main][INFO] - [train] Step 2250 out of 65536 | Loss --> 14.635 | Loss_ntp --> 7.264 | Loss_mlm --> 7.371 | Grad_l2 --> 11.527 | Weights_l2 --> 7701.388 | Lr --> 0.001 | Seconds_per_step --> 9.644 |
[2024-10-21 00:35:58,256][Main][INFO] - [train] Step 2275 out of 65536 | Loss --> 14.593 | Loss_ntp --> 7.247 | Loss_mlm --> 7.345 | Grad_l2 --> 9.993 | Weights_l2 --> 7701.384 | Lr --> 0.001 | Seconds_per_step --> 9.546 |
[2024-10-21 00:39:59,379][Main][INFO] - [train] Step 2300 out of 65536 | Loss --> 14.543 | Loss_ntp --> 7.216 | Loss_mlm --> 7.327 | Grad_l2 --> 12.147 | Weights_l2 --> 7701.381 | Lr --> 0.001 | Seconds_per_step --> 9.644 |
[2024-10-21 00:43:59,080][Main][INFO] - [train] Step 2325 out of 65536 | Loss --> 14.577 | Loss_ntp --> 7.231 | Loss_mlm --> 7.345 | Grad_l2 --> 12.365 | Weights_l2 --> 7701.376 | Lr --> 0.001 | Seconds_per_step --> 9.588 |
[2024-10-21 00:47:59,811][Main][INFO] - [train] Step 2350 out of 65536 | Loss --> 14.512 | Loss_ntp --> 7.202 | Loss_mlm --> 7.310 | Grad_l2 --> 12.472 | Weights_l2 --> 7701.372 | Lr --> 0.001 | Seconds_per_step --> 9.629 |
[2024-10-21 00:51:58,749][Main][INFO] - [train] Step 2375 out of 65536 | Loss --> 14.434 | Loss_ntp --> 7.166 | Loss_mlm --> 7.268 | Grad_l2 --> 12.198 | Weights_l2 --> 7701.368 | Lr --> 0.001 | Seconds_per_step --> 9.557 |
[2024-10-21 00:55:58,527][Main][INFO] - [train] Step 2400 out of 65536 | Loss --> 14.390 | Loss_ntp --> 7.141 | Loss_mlm --> 7.249 | Grad_l2 --> 11.488 | Weights_l2 --> 7701.365 | Lr --> 0.001 | Seconds_per_step --> 9.591 |
[2024-10-21 00:59:59,746][Main][INFO] - [train] Step 2425 out of 65536 | Loss --> 14.396 | Loss_ntp --> 7.142 | Loss_mlm --> 7.253 | Grad_l2 --> 11.924 | Weights_l2 --> 7701.361 | Lr --> 0.001 | Seconds_per_step --> 9.649 |
[2024-10-21 01:03:58,922][Main][INFO] - [train] Step 2450 out of 65536 | Loss --> 14.319 | Loss_ntp --> 7.108 | Loss_mlm --> 7.211 | Grad_l2 --> 11.587 | Weights_l2 --> 7701.357 | Lr --> 0.001 | Seconds_per_step --> 9.567 |
[2024-10-21 01:08:00,577][Main][INFO] - [train] Step 2475 out of 65536 | Loss --> 14.363 | Loss_ntp --> 7.132 | Loss_mlm --> 7.231 | Grad_l2 --> 11.854 | Weights_l2 --> 7701.353 | Lr --> 0.001 | Seconds_per_step --> 9.666 |
[2024-10-21 01:12:00,070][Main][INFO] - [train] Step 2500 out of 65536 | Loss --> 14.333 | Loss_ntp --> 7.121 | Loss_mlm --> 7.212 | Grad_l2 --> 10.363 | Weights_l2 --> 7701.349 | Lr --> 0.001 | Seconds_per_step --> 9.580 |
[2024-10-21 01:12:28,480][Main][INFO] - [eval] Step 2500 out of 65536 | Loss --> 14.573 | Loss_ntp --> 7.286 | Loss_mlm --> 7.287 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.404 |
[2024-10-21 01:16:30,064][Main][INFO] - [train] Step 2525 out of 65536 | Loss --> 14.280 | Loss_ntp --> 7.089 | Loss_mlm --> 7.192 | Grad_l2 --> 13.178 | Weights_l2 --> 7701.345 | Lr --> 0.001 | Seconds_per_step --> 9.663 |
[2024-10-21 01:20:29,018][Main][INFO] - [train] Step 2550 out of 65536 | Loss --> 14.260 | Loss_ntp --> 7.091 | Loss_mlm --> 7.169 | Grad_l2 --> 12.381 | Weights_l2 --> 7701.341 | Lr --> 0.001 | Seconds_per_step --> 9.558 |
[2024-10-21 01:24:31,253][Main][INFO] - [train] Step 2575 out of 65536 | Loss --> 14.259 | Loss_ntp --> 7.078 | Loss_mlm --> 7.182 | Grad_l2 --> 11.247 | Weights_l2 --> 7701.337 | Lr --> 0.001 | Seconds_per_step --> 9.689 |
[2024-10-21 01:28:31,446][Main][INFO] - [train] Step 2600 out of 65536 | Loss --> 14.259 | Loss_ntp --> 7.080 | Loss_mlm --> 7.179 | Grad_l2 --> 12.524 | Weights_l2 --> 7701.333 | Lr --> 0.001 | Seconds_per_step --> 9.608 |
[2024-10-21 01:32:31,794][Main][INFO] - [train] Step 2625 out of 65536 | Loss --> 14.245 | Loss_ntp --> 7.068 | Loss_mlm --> 7.178 | Grad_l2 --> 12.087 | Weights_l2 --> 7701.330 | Lr --> 0.001 | Seconds_per_step --> 9.614 |
[2024-10-21 01:36:32,411][Main][INFO] - [train] Step 2650 out of 65536 | Loss --> 14.247 | Loss_ntp --> 7.074 | Loss_mlm --> 7.173 | Grad_l2 --> 11.638 | Weights_l2 --> 7701.326 | Lr --> 0.001 | Seconds_per_step --> 9.625 |
[2024-10-21 01:40:33,462][Main][INFO] - [train] Step 2675 out of 65536 | Loss --> 14.274 | Loss_ntp --> 7.086 | Loss_mlm --> 7.189 | Grad_l2 --> 10.415 | Weights_l2 --> 7701.322 | Lr --> 0.001 | Seconds_per_step --> 9.642 |
[2024-10-21 01:44:33,254][Main][INFO] - [train] Step 2700 out of 65536 | Loss --> 14.276 | Loss_ntp --> 7.097 | Loss_mlm --> 7.179 | Grad_l2 --> 10.830 | Weights_l2 --> 7701.318 | Lr --> 0.001 | Seconds_per_step --> 9.592 |
[2024-10-21 01:48:34,104][Main][INFO] - [train] Step 2725 out of 65536 | Loss --> 14.322 | Loss_ntp --> 7.117 | Loss_mlm --> 7.205 | Grad_l2 --> 11.668 | Weights_l2 --> 7701.314 | Lr --> 0.001 | Seconds_per_step --> 9.634 |
[2024-10-21 01:52:33,834][Main][INFO] - [train] Step 2750 out of 65536 | Loss --> 14.393 | Loss_ntp --> 7.149 | Loss_mlm --> 7.244 | Grad_l2 --> 10.585 | Weights_l2 --> 7701.310 | Lr --> 0.001 | Seconds_per_step --> 9.589 |
[2024-10-21 01:56:33,130][Main][INFO] - [train] Step 2775 out of 65536 | Loss --> 14.326 | Loss_ntp --> 7.124 | Loss_mlm --> 7.202 | Grad_l2 --> 9.862 | Weights_l2 --> 7701.306 | Lr --> 0.001 | Seconds_per_step --> 9.572 |
[2024-10-21 02:00:34,375][Main][INFO] - [train] Step 2800 out of 65536 | Loss --> 14.354 | Loss_ntp --> 7.134 | Loss_mlm --> 7.220 | Grad_l2 --> 8.484 | Weights_l2 --> 7701.302 | Lr --> 0.001 | Seconds_per_step --> 9.650 |
[2024-10-21 02:04:34,763][Main][INFO] - [train] Step 2825 out of 65536 | Loss --> 14.320 | Loss_ntp --> 7.118 | Loss_mlm --> 7.202 | Grad_l2 --> 11.118 | Weights_l2 --> 7701.298 | Lr --> 0.001 | Seconds_per_step --> 9.615 |
[2024-10-21 02:08:35,157][Main][INFO] - [train] Step 2850 out of 65536 | Loss --> 14.323 | Loss_ntp --> 7.124 | Loss_mlm --> 7.199 | Grad_l2 --> 10.821 | Weights_l2 --> 7701.294 | Lr --> 0.001 | Seconds_per_step --> 9.616 |
[2024-10-21 02:12:34,860][Main][INFO] - [train] Step 2875 out of 65536 | Loss --> 14.348 | Loss_ntp --> 7.129 | Loss_mlm --> 7.219 | Grad_l2 --> 9.481 | Weights_l2 --> 7701.291 | Lr --> 0.001 | Seconds_per_step --> 9.588 |
[2024-10-21 02:16:36,448][Main][INFO] - [train] Step 2900 out of 65536 | Loss --> 14.413 | Loss_ntp --> 7.163 | Loss_mlm --> 7.250 | Grad_l2 --> 10.586 | Weights_l2 --> 7701.287 | Lr --> 0.001 | Seconds_per_step --> 9.663 |
[2024-10-21 02:20:36,563][Main][INFO] - [train] Step 2925 out of 65536 | Loss --> 14.319 | Loss_ntp --> 7.113 | Loss_mlm --> 7.206 | Grad_l2 --> 9.175 | Weights_l2 --> 7701.283 | Lr --> 0.001 | Seconds_per_step --> 9.604 |
[2024-10-21 02:24:36,522][Main][INFO] - [train] Step 2950 out of 65536 | Loss --> 14.292 | Loss_ntp --> 7.112 | Loss_mlm --> 7.179 | Grad_l2 --> 10.380 | Weights_l2 --> 7701.279 | Lr --> 0.001 | Seconds_per_step --> 9.598 |
[2024-10-21 02:28:36,510][Main][INFO] - [train] Step 2975 out of 65536 | Loss --> 14.202 | Loss_ntp --> 7.068 | Loss_mlm --> 7.134 | Grad_l2 --> 9.622 | Weights_l2 --> 7701.276 | Lr --> 0.001 | Seconds_per_step --> 9.599 |
[2024-10-21 02:32:38,120][Main][INFO] - [train] Step 3000 out of 65536 | Loss --> 14.214 | Loss_ntp --> 7.066 | Loss_mlm --> 7.147 | Grad_l2 --> 10.228 | Weights_l2 --> 7701.272 | Lr --> 0.001 | Seconds_per_step --> 9.664 |
[2024-10-21 02:33:06,984][Main][INFO] - [eval] Step 3000 out of 65536 | Loss --> 14.236 | Loss_ntp --> 7.111 | Loss_mlm --> 7.125 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.858 |
[2024-10-21 02:33:06,988][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-3000
[2024-10-21 02:33:07,000][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
[2024-10-21 02:33:13,140][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-3000/model.safetensors
[2024-10-21 02:33:21,968][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-3000/optimizer.bin
[2024-10-21 02:33:21,978][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-3000/scheduler.bin
[2024-10-21 02:33:21,979][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-3000/sampler.bin
[2024-10-21 02:33:21,981][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-3000/sampler_1.bin
[2024-10-21 02:33:21,990][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-3000/random_states_0.pkl
[2024-10-21 02:37:21,949][Main][INFO] - [train] Step 3025 out of 65536 | Loss --> 14.180 | Loss_ntp --> 7.041 | Loss_mlm --> 7.138 | Grad_l2 --> 9.928 | Weights_l2 --> 7701.268 | Lr --> 0.001 | Seconds_per_step --> 10.198 |
[2024-10-21 02:41:23,436][Main][INFO] - [train] Step 3050 out of 65536 | Loss --> 14.163 | Loss_ntp --> 7.032 | Loss_mlm --> 7.130 | Grad_l2 --> 9.909 | Weights_l2 --> 7701.264 | Lr --> 0.001 | Seconds_per_step --> 9.659 |
[2024-10-21 02:45:23,362][Main][INFO] - [train] Step 3075 out of 65536 | Loss --> 14.109 | Loss_ntp --> 7.016 | Loss_mlm --> 7.093 | Grad_l2 --> 10.119 | Weights_l2 --> 7701.260 | Lr --> 0.001 | Seconds_per_step --> 9.597 |
[2024-10-21 02:49:23,828][Main][INFO] - [train] Step 3100 out of 65536 | Loss --> 14.053 | Loss_ntp --> 6.981 | Loss_mlm --> 7.072 | Grad_l2 --> 8.917 | Weights_l2 --> 7701.256 | Lr --> 0.001 | Seconds_per_step --> 9.619 |
[2024-10-21 02:53:26,144][Main][INFO] - [train] Step 3125 out of 65536 | Loss --> 14.045 | Loss_ntp --> 6.975 | Loss_mlm --> 7.069 | Grad_l2 --> 11.184 | Weights_l2 --> 7701.252 | Lr --> 0.001 | Seconds_per_step --> 9.692 |
[2024-10-21 02:57:25,035][Main][INFO] - [train] Step 3150 out of 65536 | Loss --> 14.006 | Loss_ntp --> 6.959 | Loss_mlm --> 7.047 | Grad_l2 --> 9.280 | Weights_l2 --> 7701.248 | Lr --> 0.001 | Seconds_per_step --> 9.555 |
[2024-10-21 03:01:27,283][Main][INFO] - [train] Step 3175 out of 65536 | Loss --> 13.943 | Loss_ntp --> 6.924 | Loss_mlm --> 7.020 | Grad_l2 --> 8.769 | Weights_l2 --> 7701.245 | Lr --> 0.001 | Seconds_per_step --> 9.690 |
[2024-10-21 03:05:27,701][Main][INFO] - [train] Step 3200 out of 65536 | Loss --> 13.956 | Loss_ntp --> 6.916 | Loss_mlm --> 7.040 | Grad_l2 --> 8.625 | Weights_l2 --> 7701.241 | Lr --> 0.001 | Seconds_per_step --> 9.617 |
[2024-10-21 03:09:28,530][Main][INFO] - [train] Step 3225 out of 65536 | Loss --> 13.916 | Loss_ntp --> 6.906 | Loss_mlm --> 7.010 | Grad_l2 --> 9.378 | Weights_l2 --> 7701.238 | Lr --> 0.001 | Seconds_per_step --> 9.633 |
[2024-10-21 03:13:28,937][Main][INFO] - [train] Step 3250 out of 65536 | Loss --> 13.849 | Loss_ntp --> 6.867 | Loss_mlm --> 6.982 | Grad_l2 --> 9.221 | Weights_l2 --> 7701.234 | Lr --> 0.001 | Seconds_per_step --> 9.616 |
[2024-10-21 03:17:29,597][Main][INFO] - [train] Step 3275 out of 65536 | Loss --> 13.854 | Loss_ntp --> 6.869 | Loss_mlm --> 6.985 | Grad_l2 --> 8.561 | Weights_l2 --> 7701.230 | Lr --> 0.001 | Seconds_per_step --> 9.626 |
[2024-10-21 03:21:30,034][Main][INFO] - [train] Step 3300 out of 65536 | Loss --> 13.781 | Loss_ntp --> 6.843 | Loss_mlm --> 6.938 | Grad_l2 --> 8.919 | Weights_l2 --> 7701.226 | Lr --> 0.001 | Seconds_per_step --> 9.617 |
[2024-10-21 03:25:29,815][Main][INFO] - [train] Step 3325 out of 65536 | Loss --> 13.766 | Loss_ntp --> 6.836 | Loss_mlm --> 6.930 | Grad_l2 --> 8.129 | Weights_l2 --> 7701.223 | Lr --> 0.001 | Seconds_per_step --> 9.591 |
[2024-10-21 03:29:30,344][Main][INFO] - [train] Step 3350 out of 65536 | Loss --> 13.726 | Loss_ntp --> 6.809 | Loss_mlm --> 6.917 | Grad_l2 --> 9.145 | Weights_l2 --> 7701.219 | Lr --> 0.001 | Seconds_per_step --> 9.620 |
[2024-10-21 03:33:30,171][Main][INFO] - [train] Step 3375 out of 65536 | Loss --> 13.751 | Loss_ntp --> 6.819 | Loss_mlm --> 6.932 | Grad_l2 --> 11.666 | Weights_l2 --> 7701.215 | Lr --> 0.001 | Seconds_per_step --> 9.593 |
[2024-10-21 03:37:32,111][Main][INFO] - [train] Step 3400 out of 65536 | Loss --> 13.700 | Loss_ntp --> 6.796 | Loss_mlm --> 6.905 | Grad_l2 --> 8.776 | Weights_l2 --> 7701.211 | Lr --> 0.001 | Seconds_per_step --> 9.677 |
[2024-10-21 03:41:31,530][Main][INFO] - [train] Step 3425 out of 65536 | Loss --> 13.641 | Loss_ntp --> 6.774 | Loss_mlm --> 6.868 | Grad_l2 --> 9.206 | Weights_l2 --> 7701.207 | Lr --> 0.001 | Seconds_per_step --> 9.577 |
[2024-10-21 03:45:33,625][Main][INFO] - [train] Step 3450 out of 65536 | Loss --> 13.588 | Loss_ntp --> 6.735 | Loss_mlm --> 6.852 | Grad_l2 --> 6.293 | Weights_l2 --> 7701.204 | Lr --> 0.001 | Seconds_per_step --> 9.684 |
[2024-10-21 03:49:34,400][Main][INFO] - [train] Step 3475 out of 65536 | Loss --> 13.615 | Loss_ntp --> 6.748 | Loss_mlm --> 6.868 | Grad_l2 --> 9.161 | Weights_l2 --> 7701.201 | Lr --> 0.001 | Seconds_per_step --> 9.631 |
[2024-10-21 03:53:35,824][Main][INFO] - [train] Step 3500 out of 65536 | Loss --> 13.532 | Loss_ntp --> 6.707 | Loss_mlm --> 6.825 | Grad_l2 --> 9.556 | Weights_l2 --> 7701.197 | Lr --> 0.001 | Seconds_per_step --> 9.657 |
[2024-10-21 03:54:04,713][Main][INFO] - [eval] Step 3500 out of 65536 | Loss --> 13.912 | Loss_ntp --> 6.950 | Loss_mlm --> 6.962 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.883 |
[2024-10-21 03:58:05,620][Main][INFO] - [train] Step 3525 out of 65536 | Loss --> 13.463 | Loss_ntp --> 6.677 | Loss_mlm --> 6.786 | Grad_l2 --> 9.458 | Weights_l2 --> 7701.193 | Lr --> 0.001 | Seconds_per_step --> 9.636 |
[2024-10-21 04:02:06,516][Main][INFO] - [train] Step 3550 out of 65536 | Loss --> 13.419 | Loss_ntp --> 6.654 | Loss_mlm --> 6.766 | Grad_l2 --> 9.819 | Weights_l2 --> 7701.188 | Lr --> 0.001 | Seconds_per_step --> 9.636 |
[2024-10-21 04:06:07,229][Main][INFO] - [train] Step 3575 out of 65536 | Loss --> 13.362 | Loss_ntp --> 6.626 | Loss_mlm --> 6.736 | Grad_l2 --> 8.944 | Weights_l2 --> 7701.184 | Lr --> 0.001 | Seconds_per_step --> 9.628 |
[2024-10-21 04:10:08,761][Main][INFO] - [train] Step 3600 out of 65536 | Loss --> 13.401 | Loss_ntp --> 6.628 | Loss_mlm --> 6.773 | Grad_l2 --> 9.904 | Weights_l2 --> 7701.180 | Lr --> 0.001 | Seconds_per_step --> 9.661 |
[2024-10-21 04:14:09,815][Main][INFO] - [train] Step 3625 out of 65536 | Loss --> 13.361 | Loss_ntp --> 6.625 | Loss_mlm --> 6.736 | Grad_l2 --> 8.507 | Weights_l2 --> 7701.176 | Lr --> 0.001 | Seconds_per_step --> 9.642 |
[2024-10-21 04:18:10,037][Main][INFO] - [train] Step 3650 out of 65536 | Loss --> 13.355 | Loss_ntp --> 6.614 | Loss_mlm --> 6.741 | Grad_l2 --> 9.056 | Weights_l2 --> 7701.172 | Lr --> 0.001 | Seconds_per_step --> 9.609 |
[2024-10-21 04:22:10,677][Main][INFO] - [train] Step 3675 out of 65536 | Loss --> 13.306 | Loss_ntp --> 6.586 | Loss_mlm --> 6.720 | Grad_l2 --> 9.057 | Weights_l2 --> 7701.168 | Lr --> 0.001 | Seconds_per_step --> 9.625 |
[2024-10-21 04:26:12,857][Main][INFO] - [train] Step 3700 out of 65536 | Loss --> 13.325 | Loss_ntp --> 6.596 | Loss_mlm --> 6.729 | Grad_l2 --> 10.732 | Weights_l2 --> 7701.163 | Lr --> 0.001 | Seconds_per_step --> 9.687 |
[2024-10-21 04:30:11,816][Main][INFO] - [train] Step 3725 out of 65536 | Loss --> 13.239 | Loss_ntp --> 6.561 | Loss_mlm --> 6.678 | Grad_l2 --> 9.810 | Weights_l2 --> 7701.160 | Lr --> 0.001 | Seconds_per_step --> 9.558 |
[2024-10-21 04:34:12,167][Main][INFO] - [train] Step 3750 out of 65536 | Loss --> 13.211 | Loss_ntp --> 6.534 | Loss_mlm --> 6.677 | Grad_l2 --> 10.011 | Weights_l2 --> 7701.156 | Lr --> 0.001 | Seconds_per_step --> 9.614 |
[2024-10-21 04:38:14,046][Main][INFO] - [train] Step 3775 out of 65536 | Loss --> 13.214 | Loss_ntp --> 6.537 | Loss_mlm --> 6.678 | Grad_l2 --> 8.939 | Weights_l2 --> 7701.152 | Lr --> 0.001 | Seconds_per_step --> 9.675 |
[2024-10-21 04:42:14,454][Main][INFO] - [train] Step 3800 out of 65536 | Loss --> 13.148 | Loss_ntp --> 6.508 | Loss_mlm --> 6.640 | Grad_l2 --> 9.513 | Weights_l2 --> 7701.148 | Lr --> 0.001 | Seconds_per_step --> 9.616 |
[2024-10-21 04:46:14,554][Main][INFO] - [train] Step 3825 out of 65536 | Loss --> 13.172 | Loss_ntp --> 6.514 | Loss_mlm --> 6.658 | Grad_l2 --> 9.295 | Weights_l2 --> 7701.144 | Lr --> 0.001 | Seconds_per_step --> 9.604 |
[2024-10-21 04:50:14,762][Main][INFO] - [train] Step 3850 out of 65536 | Loss --> 13.118 | Loss_ntp --> 6.494 | Loss_mlm --> 6.624 | Grad_l2 --> 7.890 | Weights_l2 --> 7701.140 | Lr --> 0.001 | Seconds_per_step --> 9.608 |
[2024-10-21 04:54:16,032][Main][INFO] - [train] Step 3875 out of 65536 | Loss --> 13.179 | Loss_ntp --> 6.521 | Loss_mlm --> 6.657 | Grad_l2 --> 9.901 | Weights_l2 --> 7701.136 | Lr --> 0.001 | Seconds_per_step --> 9.651 |
[2024-10-21 04:58:16,128][Main][INFO] - [train] Step 3900 out of 65536 | Loss --> 13.259 | Loss_ntp --> 6.571 | Loss_mlm --> 6.687 | Grad_l2 --> 8.910 | Weights_l2 --> 7701.132 | Lr --> 0.001 | Seconds_per_step --> 9.604 |
|