File size: 39,411 Bytes
ca671b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
[2024-10-20 18:25:17,510][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[2024-10-20 18:25:17,521][Main][INFO] - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: bf16

[2024-10-20 18:25:17,522][Main][INFO] - Working directory is /workspace/nanoT5/logs/2024-10-20/18-25-17
[2024-10-20 18:31:35,111][Main][INFO] - [train] Step 25 out of 65536 | Loss --> 155.837 | Loss_ntp --> 76.275 | Loss_mlm --> 79.561 | Grad_l2 --> 476.354 | Weights_l2 --> 7701.821 | Lr --> 0.001 | Seconds_per_step --> 14.044 | 
[2024-10-20 18:35:35,171][Main][INFO] - [train] Step 50 out of 65536 | Loss --> 98.644 | Loss_ntp --> 48.540 | Loss_mlm --> 50.105 | Grad_l2 --> 234.932 | Weights_l2 --> 7701.813 | Lr --> 0.001 | Seconds_per_step --> 9.602 | 
[2024-10-20 18:39:35,197][Main][INFO] - [train] Step 75 out of 65536 | Loss --> 86.994 | Loss_ntp --> 42.861 | Loss_mlm --> 44.133 | Grad_l2 --> 180.388 | Weights_l2 --> 7701.806 | Lr --> 0.001 | Seconds_per_step --> 9.601 | 
[2024-10-20 18:43:35,733][Main][INFO] - [train] Step 100 out of 65536 | Loss --> 80.568 | Loss_ntp --> 39.806 | Loss_mlm --> 40.762 | Grad_l2 --> 156.732 | Weights_l2 --> 7701.800 | Lr --> 0.001 | Seconds_per_step --> 9.621 | 
[2024-10-20 18:47:37,016][Main][INFO] - [train] Step 125 out of 65536 | Loss --> 77.131 | Loss_ntp --> 38.127 | Loss_mlm --> 39.004 | Grad_l2 --> 179.590 | Weights_l2 --> 7701.794 | Lr --> 0.001 | Seconds_per_step --> 9.651 | 
[2024-10-20 18:51:38,437][Main][INFO] - [train] Step 150 out of 65536 | Loss --> 73.900 | Loss_ntp --> 36.620 | Loss_mlm --> 37.281 | Grad_l2 --> 161.591 | Weights_l2 --> 7701.789 | Lr --> 0.001 | Seconds_per_step --> 9.657 | 
[2024-10-20 18:55:39,020][Main][INFO] - [train] Step 175 out of 65536 | Loss --> 72.118 | Loss_ntp --> 35.763 | Loss_mlm --> 36.355 | Grad_l2 --> 161.741 | Weights_l2 --> 7701.783 | Lr --> 0.001 | Seconds_per_step --> 9.623 | 
[2024-10-20 18:59:40,344][Main][INFO] - [train] Step 200 out of 65536 | Loss --> 70.712 | Loss_ntp --> 35.041 | Loss_mlm --> 35.671 | Grad_l2 --> 154.736 | Weights_l2 --> 7701.778 | Lr --> 0.001 | Seconds_per_step --> 9.653 | 
[2024-10-20 19:03:39,817][Main][INFO] - [train] Step 225 out of 65536 | Loss --> 69.050 | Loss_ntp --> 34.233 | Loss_mlm --> 34.817 | Grad_l2 --> 106.908 | Weights_l2 --> 7701.772 | Lr --> 0.001 | Seconds_per_step --> 9.579 | 
[2024-10-20 19:07:41,876][Main][INFO] - [train] Step 250 out of 65536 | Loss --> 68.595 | Loss_ntp --> 33.970 | Loss_mlm --> 34.625 | Grad_l2 --> 126.557 | Weights_l2 --> 7701.767 | Lr --> 0.001 | Seconds_per_step --> 9.682 | 
[2024-10-20 19:11:43,944][Main][INFO] - [train] Step 275 out of 65536 | Loss --> 67.141 | Loss_ntp --> 33.297 | Loss_mlm --> 33.844 | Grad_l2 --> 114.874 | Weights_l2 --> 7701.762 | Lr --> 0.001 | Seconds_per_step --> 9.683 | 
[2024-10-20 19:15:43,786][Main][INFO] - [train] Step 300 out of 65536 | Loss --> 65.916 | Loss_ntp --> 32.693 | Loss_mlm --> 33.223 | Grad_l2 --> 89.430 | Weights_l2 --> 7701.757 | Lr --> 0.001 | Seconds_per_step --> 9.594 | 
[2024-10-20 19:19:45,206][Main][INFO] - [train] Step 325 out of 65536 | Loss --> 65.322 | Loss_ntp --> 32.362 | Loss_mlm --> 32.960 | Grad_l2 --> 97.785 | Weights_l2 --> 7701.751 | Lr --> 0.001 | Seconds_per_step --> 9.657 | 
[2024-10-20 19:23:45,072][Main][INFO] - [train] Step 350 out of 65536 | Loss --> 64.367 | Loss_ntp --> 31.937 | Loss_mlm --> 32.430 | Grad_l2 --> 83.882 | Weights_l2 --> 7701.746 | Lr --> 0.001 | Seconds_per_step --> 9.595 | 
[2024-10-20 19:27:46,534][Main][INFO] - [train] Step 375 out of 65536 | Loss --> 63.409 | Loss_ntp --> 31.433 | Loss_mlm --> 31.975 | Grad_l2 --> 75.548 | Weights_l2 --> 7701.741 | Lr --> 0.001 | Seconds_per_step --> 9.658 | 
[2024-10-20 19:31:45,390][Main][INFO] - [train] Step 400 out of 65536 | Loss --> 62.292 | Loss_ntp --> 30.925 | Loss_mlm --> 31.367 | Grad_l2 --> 72.299 | Weights_l2 --> 7701.736 | Lr --> 0.001 | Seconds_per_step --> 9.554 | 
[2024-10-20 19:35:46,689][Main][INFO] - [train] Step 425 out of 65536 | Loss --> 61.685 | Loss_ntp --> 30.585 | Loss_mlm --> 31.100 | Grad_l2 --> 73.838 | Weights_l2 --> 7701.731 | Lr --> 0.001 | Seconds_per_step --> 9.652 | 
[2024-10-20 19:39:46,030][Main][INFO] - [train] Step 450 out of 65536 | Loss --> 61.416 | Loss_ntp --> 30.509 | Loss_mlm --> 30.907 | Grad_l2 --> 79.820 | Weights_l2 --> 7701.726 | Lr --> 0.001 | Seconds_per_step --> 9.573 | 
[2024-10-20 19:43:47,298][Main][INFO] - [train] Step 475 out of 65536 | Loss --> 60.536 | Loss_ntp --> 30.069 | Loss_mlm --> 30.467 | Grad_l2 --> 59.074 | Weights_l2 --> 7701.722 | Lr --> 0.001 | Seconds_per_step --> 9.651 | 
[2024-10-20 19:47:48,778][Main][INFO] - [train] Step 500 out of 65536 | Loss --> 60.085 | Loss_ntp --> 29.838 | Loss_mlm --> 30.246 | Grad_l2 --> 71.417 | Weights_l2 --> 7701.717 | Lr --> 0.001 | Seconds_per_step --> 9.659 | 
[2024-10-20 19:49:25,862][Main][INFO] - [eval] Step 500 out of 65536 | Loss --> 57.611 | Loss_ntp --> 28.694 | Loss_mlm --> 28.917 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 97.080 | 
[2024-10-20 19:53:26,482][Main][INFO] - [train] Step 525 out of 65536 | Loss --> 59.106 | Loss_ntp --> 29.371 | Loss_mlm --> 29.735 | Grad_l2 --> 56.829 | Weights_l2 --> 7701.712 | Lr --> 0.001 | Seconds_per_step --> 9.625 | 
[2024-10-20 19:57:25,811][Main][INFO] - [train] Step 550 out of 65536 | Loss --> 58.185 | Loss_ntp --> 28.950 | Loss_mlm --> 29.235 | Grad_l2 --> 56.368 | Weights_l2 --> 7701.707 | Lr --> 0.001 | Seconds_per_step --> 9.573 | 
[2024-10-20 20:01:26,095][Main][INFO] - [train] Step 575 out of 65536 | Loss --> 57.301 | Loss_ntp --> 28.480 | Loss_mlm --> 28.821 | Grad_l2 --> 39.860 | Weights_l2 --> 7701.703 | Lr --> 0.001 | Seconds_per_step --> 9.611 | 
[2024-10-20 20:05:26,649][Main][INFO] - [train] Step 600 out of 65536 | Loss --> 56.020 | Loss_ntp --> 27.906 | Loss_mlm --> 28.115 | Grad_l2 --> 35.414 | Weights_l2 --> 7701.698 | Lr --> 0.001 | Seconds_per_step --> 9.622 | 
[2024-10-20 20:09:28,597][Main][INFO] - [train] Step 625 out of 65536 | Loss --> 55.363 | Loss_ntp --> 27.524 | Loss_mlm --> 27.840 | Grad_l2 --> 50.531 | Weights_l2 --> 7701.694 | Lr --> 0.001 | Seconds_per_step --> 9.678 | 
[2024-10-20 20:13:29,399][Main][INFO] - [train] Step 650 out of 65536 | Loss --> 54.803 | Loss_ntp --> 27.252 | Loss_mlm --> 27.551 | Grad_l2 --> 56.108 | Weights_l2 --> 7701.689 | Lr --> 0.001 | Seconds_per_step --> 9.632 | 
[2024-10-20 20:17:31,948][Main][INFO] - [train] Step 675 out of 65536 | Loss --> 53.970 | Loss_ntp --> 26.793 | Loss_mlm --> 27.176 | Grad_l2 --> 46.473 | Weights_l2 --> 7701.685 | Lr --> 0.001 | Seconds_per_step --> 9.702 | 
[2024-10-20 20:21:31,196][Main][INFO] - [train] Step 700 out of 65536 | Loss --> 53.056 | Loss_ntp --> 26.359 | Loss_mlm --> 26.697 | Grad_l2 --> 37.435 | Weights_l2 --> 7701.680 | Lr --> 0.001 | Seconds_per_step --> 9.570 | 
[2024-10-20 20:25:33,347][Main][INFO] - [train] Step 725 out of 65536 | Loss --> 52.070 | Loss_ntp --> 25.876 | Loss_mlm --> 26.194 | Grad_l2 --> 43.881 | Weights_l2 --> 7701.676 | Lr --> 0.001 | Seconds_per_step --> 9.686 | 
[2024-10-20 20:29:33,004][Main][INFO] - [train] Step 750 out of 65536 | Loss --> 51.191 | Loss_ntp --> 25.456 | Loss_mlm --> 25.735 | Grad_l2 --> 44.855 | Weights_l2 --> 7701.672 | Lr --> 0.001 | Seconds_per_step --> 9.586 | 
[2024-10-20 20:33:34,557][Main][INFO] - [train] Step 775 out of 65536 | Loss --> 50.129 | Loss_ntp --> 24.891 | Loss_mlm --> 25.239 | Grad_l2 --> 40.117 | Weights_l2 --> 7701.667 | Lr --> 0.001 | Seconds_per_step --> 9.662 | 
[2024-10-20 20:37:33,242][Main][INFO] - [train] Step 800 out of 65536 | Loss --> 49.019 | Loss_ntp --> 24.361 | Loss_mlm --> 24.658 | Grad_l2 --> 39.953 | Weights_l2 --> 7701.663 | Lr --> 0.001 | Seconds_per_step --> 9.547 | 
[2024-10-20 20:41:33,285][Main][INFO] - [train] Step 825 out of 65536 | Loss --> 48.160 | Loss_ntp --> 23.923 | Loss_mlm --> 24.238 | Grad_l2 --> 42.816 | Weights_l2 --> 7701.659 | Lr --> 0.001 | Seconds_per_step --> 9.602 | 
[2024-10-20 20:45:34,352][Main][INFO] - [train] Step 850 out of 65536 | Loss --> 46.672 | Loss_ntp --> 23.149 | Loss_mlm --> 23.522 | Grad_l2 --> 42.230 | Weights_l2 --> 7701.654 | Lr --> 0.001 | Seconds_per_step --> 9.643 | 
[2024-10-20 20:49:34,963][Main][INFO] - [train] Step 875 out of 65536 | Loss --> 44.855 | Loss_ntp --> 22.279 | Loss_mlm --> 22.575 | Grad_l2 --> 39.123 | Weights_l2 --> 7701.650 | Lr --> 0.001 | Seconds_per_step --> 9.624 | 
[2024-10-20 20:53:36,677][Main][INFO] - [train] Step 900 out of 65536 | Loss --> 42.480 | Loss_ntp --> 21.057 | Loss_mlm --> 21.423 | Grad_l2 --> 50.501 | Weights_l2 --> 7701.645 | Lr --> 0.001 | Seconds_per_step --> 9.668 | 
[2024-10-20 20:57:37,186][Main][INFO] - [train] Step 925 out of 65536 | Loss --> 40.028 | Loss_ntp --> 19.877 | Loss_mlm --> 20.151 | Grad_l2 --> 57.109 | Weights_l2 --> 7701.640 | Lr --> 0.001 | Seconds_per_step --> 9.620 | 
[2024-10-20 21:01:38,800][Main][INFO] - [train] Step 950 out of 65536 | Loss --> 37.058 | Loss_ntp --> 18.359 | Loss_mlm --> 18.699 | Grad_l2 --> 78.443 | Weights_l2 --> 7701.634 | Lr --> 0.001 | Seconds_per_step --> 9.664 | 
[2024-10-20 21:05:38,405][Main][INFO] - [train] Step 975 out of 65536 | Loss --> 33.534 | Loss_ntp --> 16.618 | Loss_mlm --> 16.917 | Grad_l2 --> 87.220 | Weights_l2 --> 7701.628 | Lr --> 0.001 | Seconds_per_step --> 9.584 | 
[2024-10-20 21:09:41,153][Main][INFO] - [train] Step 1000 out of 65536 | Loss --> 29.988 | Loss_ntp --> 14.857 | Loss_mlm --> 15.131 | Grad_l2 --> 88.279 | Weights_l2 --> 7701.622 | Lr --> 0.001 | Seconds_per_step --> 9.710 | 
[2024-10-20 21:10:10,310][Main][INFO] - [eval] Step 1000 out of 65536 | Loss --> 28.033 | Loss_ntp --> 13.938 | Loss_mlm --> 14.095 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 29.143 | 
[2024-10-20 21:14:10,580][Main][INFO] - [train] Step 1025 out of 65536 | Loss --> 26.588 | Loss_ntp --> 13.166 | Loss_mlm --> 13.423 | Grad_l2 --> 109.226 | Weights_l2 --> 7701.616 | Lr --> 0.001 | Seconds_per_step --> 9.611 | 
[2024-10-20 21:18:12,558][Main][INFO] - [train] Step 1050 out of 65536 | Loss --> 23.850 | Loss_ntp --> 11.830 | Loss_mlm --> 12.020 | Grad_l2 --> 98.666 | Weights_l2 --> 7701.610 | Lr --> 0.001 | Seconds_per_step --> 9.679 | 
[2024-10-20 21:22:11,593][Main][INFO] - [train] Step 1075 out of 65536 | Loss --> 21.589 | Loss_ntp --> 10.697 | Loss_mlm --> 10.892 | Grad_l2 --> 104.858 | Weights_l2 --> 7701.605 | Lr --> 0.001 | Seconds_per_step --> 9.561 | 
[2024-10-20 21:26:13,779][Main][INFO] - [train] Step 1100 out of 65536 | Loss --> 19.443 | Loss_ntp --> 9.626 | Loss_mlm --> 9.817 | Grad_l2 --> 75.473 | Weights_l2 --> 7701.599 | Lr --> 0.001 | Seconds_per_step --> 9.687 | 
[2024-10-20 21:30:13,762][Main][INFO] - [train] Step 1125 out of 65536 | Loss --> 17.771 | Loss_ntp --> 8.793 | Loss_mlm --> 8.978 | Grad_l2 --> 55.492 | Weights_l2 --> 7701.593 | Lr --> 0.001 | Seconds_per_step --> 9.599 | 
[2024-10-20 21:34:14,478][Main][INFO] - [train] Step 1150 out of 65536 | Loss --> 17.092 | Loss_ntp --> 8.462 | Loss_mlm --> 8.630 | Grad_l2 --> 72.673 | Weights_l2 --> 7701.587 | Lr --> 0.001 | Seconds_per_step --> 9.629 | 
[2024-10-20 21:38:14,797][Main][INFO] - [train] Step 1175 out of 65536 | Loss --> 16.731 | Loss_ntp --> 8.294 | Loss_mlm --> 8.437 | Grad_l2 --> 60.718 | Weights_l2 --> 7701.582 | Lr --> 0.001 | Seconds_per_step --> 9.613 | 
[2024-10-20 21:42:15,467][Main][INFO] - [train] Step 1200 out of 65536 | Loss --> 16.522 | Loss_ntp --> 8.188 | Loss_mlm --> 8.334 | Grad_l2 --> 62.414 | Weights_l2 --> 7701.577 | Lr --> 0.001 | Seconds_per_step --> 9.627 | 
[2024-10-20 21:46:15,957][Main][INFO] - [train] Step 1225 out of 65536 | Loss --> 16.336 | Loss_ntp --> 8.096 | Loss_mlm --> 8.240 | Grad_l2 --> 57.944 | Weights_l2 --> 7701.572 | Lr --> 0.001 | Seconds_per_step --> 9.619 | 
[2024-10-20 21:50:15,276][Main][INFO] - [train] Step 1250 out of 65536 | Loss --> 16.167 | Loss_ntp --> 8.006 | Loss_mlm --> 8.161 | Grad_l2 --> 42.899 | Weights_l2 --> 7701.567 | Lr --> 0.001 | Seconds_per_step --> 9.573 | 
[2024-10-20 21:54:18,039][Main][INFO] - [train] Step 1275 out of 65536 | Loss --> 16.183 | Loss_ntp --> 8.017 | Loss_mlm --> 8.166 | Grad_l2 --> 48.492 | Weights_l2 --> 7701.563 | Lr --> 0.001 | Seconds_per_step --> 9.710 | 
[2024-10-20 21:58:18,396][Main][INFO] - [train] Step 1300 out of 65536 | Loss --> 15.988 | Loss_ntp --> 7.926 | Loss_mlm --> 8.063 | Grad_l2 --> 42.852 | Weights_l2 --> 7701.558 | Lr --> 0.001 | Seconds_per_step --> 9.614 | 
[2024-10-20 22:02:20,263][Main][INFO] - [train] Step 1325 out of 65536 | Loss --> 15.982 | Loss_ntp --> 7.916 | Loss_mlm --> 8.066 | Grad_l2 --> 47.218 | Weights_l2 --> 7701.553 | Lr --> 0.001 | Seconds_per_step --> 9.675 | 
[2024-10-20 22:06:20,739][Main][INFO] - [train] Step 1350 out of 65536 | Loss --> 15.830 | Loss_ntp --> 7.838 | Loss_mlm --> 7.992 | Grad_l2 --> 28.805 | Weights_l2 --> 7701.549 | Lr --> 0.001 | Seconds_per_step --> 9.619 | 
[2024-10-20 22:10:23,190][Main][INFO] - [train] Step 1375 out of 65536 | Loss --> 15.806 | Loss_ntp --> 7.839 | Loss_mlm --> 7.967 | Grad_l2 --> 37.388 | Weights_l2 --> 7701.544 | Lr --> 0.001 | Seconds_per_step --> 9.698 | 
[2024-10-20 22:14:23,525][Main][INFO] - [train] Step 1400 out of 65536 | Loss --> 15.775 | Loss_ntp --> 7.813 | Loss_mlm --> 7.962 | Grad_l2 --> 35.380 | Weights_l2 --> 7701.540 | Lr --> 0.001 | Seconds_per_step --> 9.613 | 
[2024-10-20 22:18:25,080][Main][INFO] - [train] Step 1425 out of 65536 | Loss --> 15.722 | Loss_ntp --> 7.794 | Loss_mlm --> 7.928 | Grad_l2 --> 34.978 | Weights_l2 --> 7701.535 | Lr --> 0.001 | Seconds_per_step --> 9.662 | 
[2024-10-20 22:22:24,651][Main][INFO] - [train] Step 1450 out of 65536 | Loss --> 15.638 | Loss_ntp --> 7.739 | Loss_mlm --> 7.899 | Grad_l2 --> 24.003 | Weights_l2 --> 7701.530 | Lr --> 0.001 | Seconds_per_step --> 9.583 | 
[2024-10-20 22:26:24,495][Main][INFO] - [train] Step 1475 out of 65536 | Loss --> 15.682 | Loss_ntp --> 7.768 | Loss_mlm --> 7.913 | Grad_l2 --> 27.599 | Weights_l2 --> 7701.526 | Lr --> 0.001 | Seconds_per_step --> 9.594 | 
[2024-10-20 22:30:25,992][Main][INFO] - [train] Step 1500 out of 65536 | Loss --> 15.638 | Loss_ntp --> 7.754 | Loss_mlm --> 7.884 | Grad_l2 --> 22.985 | Weights_l2 --> 7701.521 | Lr --> 0.001 | Seconds_per_step --> 9.660 | 
[2024-10-20 22:30:54,697][Main][INFO] - [eval] Step 1500 out of 65536 | Loss --> 15.664 | Loss_ntp --> 7.782 | Loss_mlm --> 7.882 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.700 | 
[2024-10-20 22:30:54,709][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-1500
[2024-10-20 22:30:54,719][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
[2024-10-20 22:30:59,988][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-1500/model.safetensors
[2024-10-20 22:31:08,673][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-1500/optimizer.bin
[2024-10-20 22:31:08,682][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-1500/scheduler.bin
[2024-10-20 22:31:08,684][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-1500/sampler.bin
[2024-10-20 22:31:08,686][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-1500/sampler_1.bin
[2024-10-20 22:31:08,694][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-1500/random_states_0.pkl
[2024-10-20 22:35:09,885][Main][INFO] - [train] Step 1525 out of 65536 | Loss --> 15.740 | Loss_ntp --> 7.803 | Loss_mlm --> 7.937 | Grad_l2 --> 35.476 | Weights_l2 --> 7701.516 | Lr --> 0.001 | Seconds_per_step --> 10.207 | 
[2024-10-20 22:39:10,189][Main][INFO] - [train] Step 1550 out of 65536 | Loss --> 15.717 | Loss_ntp --> 7.796 | Loss_mlm --> 7.921 | Grad_l2 --> 32.209 | Weights_l2 --> 7701.511 | Lr --> 0.001 | Seconds_per_step --> 9.612 | 
[2024-10-20 22:43:12,020][Main][INFO] - [train] Step 1575 out of 65536 | Loss --> 15.723 | Loss_ntp --> 7.805 | Loss_mlm --> 7.918 | Grad_l2 --> 35.393 | Weights_l2 --> 7701.506 | Lr --> 0.001 | Seconds_per_step --> 9.673 | 
[2024-10-20 22:47:13,492][Main][INFO] - [train] Step 1600 out of 65536 | Loss --> 15.617 | Loss_ntp --> 7.752 | Loss_mlm --> 7.865 | Grad_l2 --> 29.357 | Weights_l2 --> 7701.502 | Lr --> 0.001 | Seconds_per_step --> 9.659 | 
[2024-10-20 22:51:13,978][Main][INFO] - [train] Step 1625 out of 65536 | Loss --> 15.532 | Loss_ntp --> 7.709 | Loss_mlm --> 7.822 | Grad_l2 --> 18.501 | Weights_l2 --> 7701.497 | Lr --> 0.001 | Seconds_per_step --> 9.619 | 
[2024-10-20 22:55:14,600][Main][INFO] - [train] Step 1650 out of 65536 | Loss --> 15.565 | Loss_ntp --> 7.720 | Loss_mlm --> 7.845 | Grad_l2 --> 17.546 | Weights_l2 --> 7701.493 | Lr --> 0.001 | Seconds_per_step --> 9.625 | 
[2024-10-20 22:59:14,384][Main][INFO] - [train] Step 1675 out of 65536 | Loss --> 15.576 | Loss_ntp --> 7.737 | Loss_mlm --> 7.838 | Grad_l2 --> 23.599 | Weights_l2 --> 7701.489 | Lr --> 0.001 | Seconds_per_step --> 9.591 | 
[2024-10-20 23:03:16,878][Main][INFO] - [train] Step 1700 out of 65536 | Loss --> 15.612 | Loss_ntp --> 7.757 | Loss_mlm --> 7.855 | Grad_l2 --> 28.685 | Weights_l2 --> 7701.484 | Lr --> 0.001 | Seconds_per_step --> 9.700 | 
[2024-10-20 23:07:16,611][Main][INFO] - [train] Step 1725 out of 65536 | Loss --> 15.590 | Loss_ntp --> 7.728 | Loss_mlm --> 7.861 | Grad_l2 --> 22.357 | Weights_l2 --> 7701.479 | Lr --> 0.001 | Seconds_per_step --> 9.589 | 
[2024-10-20 23:11:18,435][Main][INFO] - [train] Step 1750 out of 65536 | Loss --> 15.475 | Loss_ntp --> 7.683 | Loss_mlm --> 7.792 | Grad_l2 --> 20.808 | Weights_l2 --> 7701.475 | Lr --> 0.001 | Seconds_per_step --> 9.673 | 
[2024-10-20 23:15:17,324][Main][INFO] - [train] Step 1775 out of 65536 | Loss --> 15.422 | Loss_ntp --> 7.655 | Loss_mlm --> 7.767 | Grad_l2 --> 16.928 | Weights_l2 --> 7701.470 | Lr --> 0.001 | Seconds_per_step --> 9.555 | 
[2024-10-20 23:19:17,823][Main][INFO] - [train] Step 1800 out of 65536 | Loss --> 15.370 | Loss_ntp --> 7.625 | Loss_mlm --> 7.745 | Grad_l2 --> 16.147 | Weights_l2 --> 7701.466 | Lr --> 0.001 | Seconds_per_step --> 9.620 | 
[2024-10-20 23:23:19,005][Main][INFO] - [train] Step 1825 out of 65536 | Loss --> 15.363 | Loss_ntp --> 7.629 | Loss_mlm --> 7.734 | Grad_l2 --> 19.934 | Weights_l2 --> 7701.462 | Lr --> 0.001 | Seconds_per_step --> 9.647 | 
[2024-10-20 23:27:17,933][Main][INFO] - [train] Step 1850 out of 65536 | Loss --> 15.347 | Loss_ntp --> 7.616 | Loss_mlm --> 7.732 | Grad_l2 --> 25.592 | Weights_l2 --> 7701.457 | Lr --> 0.001 | Seconds_per_step --> 9.557 | 
[2024-10-20 23:31:19,805][Main][INFO] - [train] Step 1875 out of 65536 | Loss --> 15.254 | Loss_ntp --> 7.577 | Loss_mlm --> 7.677 | Grad_l2 --> 19.500 | Weights_l2 --> 7701.453 | Lr --> 0.001 | Seconds_per_step --> 9.675 | 
[2024-10-20 23:35:18,582][Main][INFO] - [train] Step 1900 out of 65536 | Loss --> 15.204 | Loss_ntp --> 7.550 | Loss_mlm --> 7.653 | Grad_l2 --> 15.358 | Weights_l2 --> 7701.448 | Lr --> 0.001 | Seconds_per_step --> 9.551 | 
[2024-10-20 23:39:20,300][Main][INFO] - [train] Step 1925 out of 65536 | Loss --> 15.153 | Loss_ntp --> 7.525 | Loss_mlm --> 7.628 | Grad_l2 --> 13.241 | Weights_l2 --> 7701.445 | Lr --> 0.001 | Seconds_per_step --> 9.669 | 
[2024-10-20 23:43:21,680][Main][INFO] - [train] Step 1950 out of 65536 | Loss --> 15.111 | Loss_ntp --> 7.497 | Loss_mlm --> 7.614 | Grad_l2 --> 13.357 | Weights_l2 --> 7701.441 | Lr --> 0.001 | Seconds_per_step --> 9.655 | 
[2024-10-20 23:47:22,111][Main][INFO] - [train] Step 1975 out of 65536 | Loss --> 15.072 | Loss_ntp --> 7.475 | Loss_mlm --> 7.597 | Grad_l2 --> 15.485 | Weights_l2 --> 7701.437 | Lr --> 0.001 | Seconds_per_step --> 9.617 | 
[2024-10-20 23:51:21,960][Main][INFO] - [train] Step 2000 out of 65536 | Loss --> 15.061 | Loss_ntp --> 7.470 | Loss_mlm --> 7.591 | Grad_l2 --> 15.511 | Weights_l2 --> 7701.432 | Lr --> 0.001 | Seconds_per_step --> 9.594 | 
[2024-10-20 23:51:50,849][Main][INFO] - [eval] Step 2000 out of 65536 | Loss --> 15.092 | Loss_ntp --> 7.501 | Loss_mlm --> 7.591 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.883 | 
[2024-10-20 23:55:53,490][Main][INFO] - [train] Step 2025 out of 65536 | Loss --> 15.080 | Loss_ntp --> 7.479 | Loss_mlm --> 7.601 | Grad_l2 --> 17.451 | Weights_l2 --> 7701.428 | Lr --> 0.001 | Seconds_per_step --> 9.705 | 
[2024-10-20 23:59:53,747][Main][INFO] - [train] Step 2050 out of 65536 | Loss --> 14.998 | Loss_ntp --> 7.447 | Loss_mlm --> 7.551 | Grad_l2 --> 13.242 | Weights_l2 --> 7701.424 | Lr --> 0.001 | Seconds_per_step --> 9.610 | 
[2024-10-21 00:03:57,114][Main][INFO] - [train] Step 2075 out of 65536 | Loss --> 14.994 | Loss_ntp --> 7.431 | Loss_mlm --> 7.562 | Grad_l2 --> 17.409 | Weights_l2 --> 7701.419 | Lr --> 0.001 | Seconds_per_step --> 9.735 | 
[2024-10-21 00:07:56,557][Main][INFO] - [train] Step 2100 out of 65536 | Loss --> 14.993 | Loss_ntp --> 7.437 | Loss_mlm --> 7.556 | Grad_l2 --> 23.374 | Weights_l2 --> 7701.414 | Lr --> 0.001 | Seconds_per_step --> 9.578 | 
[2024-10-21 00:11:56,818][Main][INFO] - [train] Step 2125 out of 65536 | Loss --> 14.963 | Loss_ntp --> 7.428 | Loss_mlm --> 7.535 | Grad_l2 --> 24.857 | Weights_l2 --> 7701.410 | Lr --> 0.001 | Seconds_per_step --> 9.610 | 
[2024-10-21 00:15:56,927][Main][INFO] - [train] Step 2150 out of 65536 | Loss --> 14.829 | Loss_ntp --> 7.354 | Loss_mlm --> 7.474 | Grad_l2 --> 14.538 | Weights_l2 --> 7701.405 | Lr --> 0.001 | Seconds_per_step --> 9.604 | 
[2024-10-21 00:19:57,089][Main][INFO] - [train] Step 2175 out of 65536 | Loss --> 14.797 | Loss_ntp --> 7.344 | Loss_mlm --> 7.453 | Grad_l2 --> 13.598 | Weights_l2 --> 7701.400 | Lr --> 0.001 | Seconds_per_step --> 9.606 | 
[2024-10-21 00:23:58,135][Main][INFO] - [train] Step 2200 out of 65536 | Loss --> 14.774 | Loss_ntp --> 7.321 | Loss_mlm --> 7.454 | Grad_l2 --> 13.339 | Weights_l2 --> 7701.396 | Lr --> 0.001 | Seconds_per_step --> 9.642 | 
[2024-10-21 00:27:58,499][Main][INFO] - [train] Step 2225 out of 65536 | Loss --> 14.671 | Loss_ntp --> 7.284 | Loss_mlm --> 7.387 | Grad_l2 --> 13.884 | Weights_l2 --> 7701.392 | Lr --> 0.001 | Seconds_per_step --> 9.614 | 
[2024-10-21 00:31:59,596][Main][INFO] - [train] Step 2250 out of 65536 | Loss --> 14.635 | Loss_ntp --> 7.264 | Loss_mlm --> 7.371 | Grad_l2 --> 11.527 | Weights_l2 --> 7701.388 | Lr --> 0.001 | Seconds_per_step --> 9.644 | 
[2024-10-21 00:35:58,256][Main][INFO] - [train] Step 2275 out of 65536 | Loss --> 14.593 | Loss_ntp --> 7.247 | Loss_mlm --> 7.345 | Grad_l2 --> 9.993 | Weights_l2 --> 7701.384 | Lr --> 0.001 | Seconds_per_step --> 9.546 | 
[2024-10-21 00:39:59,379][Main][INFO] - [train] Step 2300 out of 65536 | Loss --> 14.543 | Loss_ntp --> 7.216 | Loss_mlm --> 7.327 | Grad_l2 --> 12.147 | Weights_l2 --> 7701.381 | Lr --> 0.001 | Seconds_per_step --> 9.644 | 
[2024-10-21 00:43:59,080][Main][INFO] - [train] Step 2325 out of 65536 | Loss --> 14.577 | Loss_ntp --> 7.231 | Loss_mlm --> 7.345 | Grad_l2 --> 12.365 | Weights_l2 --> 7701.376 | Lr --> 0.001 | Seconds_per_step --> 9.588 | 
[2024-10-21 00:47:59,811][Main][INFO] - [train] Step 2350 out of 65536 | Loss --> 14.512 | Loss_ntp --> 7.202 | Loss_mlm --> 7.310 | Grad_l2 --> 12.472 | Weights_l2 --> 7701.372 | Lr --> 0.001 | Seconds_per_step --> 9.629 | 
[2024-10-21 00:51:58,749][Main][INFO] - [train] Step 2375 out of 65536 | Loss --> 14.434 | Loss_ntp --> 7.166 | Loss_mlm --> 7.268 | Grad_l2 --> 12.198 | Weights_l2 --> 7701.368 | Lr --> 0.001 | Seconds_per_step --> 9.557 | 
[2024-10-21 00:55:58,527][Main][INFO] - [train] Step 2400 out of 65536 | Loss --> 14.390 | Loss_ntp --> 7.141 | Loss_mlm --> 7.249 | Grad_l2 --> 11.488 | Weights_l2 --> 7701.365 | Lr --> 0.001 | Seconds_per_step --> 9.591 | 
[2024-10-21 00:59:59,746][Main][INFO] - [train] Step 2425 out of 65536 | Loss --> 14.396 | Loss_ntp --> 7.142 | Loss_mlm --> 7.253 | Grad_l2 --> 11.924 | Weights_l2 --> 7701.361 | Lr --> 0.001 | Seconds_per_step --> 9.649 | 
[2024-10-21 01:03:58,922][Main][INFO] - [train] Step 2450 out of 65536 | Loss --> 14.319 | Loss_ntp --> 7.108 | Loss_mlm --> 7.211 | Grad_l2 --> 11.587 | Weights_l2 --> 7701.357 | Lr --> 0.001 | Seconds_per_step --> 9.567 | 
[2024-10-21 01:08:00,577][Main][INFO] - [train] Step 2475 out of 65536 | Loss --> 14.363 | Loss_ntp --> 7.132 | Loss_mlm --> 7.231 | Grad_l2 --> 11.854 | Weights_l2 --> 7701.353 | Lr --> 0.001 | Seconds_per_step --> 9.666 | 
[2024-10-21 01:12:00,070][Main][INFO] - [train] Step 2500 out of 65536 | Loss --> 14.333 | Loss_ntp --> 7.121 | Loss_mlm --> 7.212 | Grad_l2 --> 10.363 | Weights_l2 --> 7701.349 | Lr --> 0.001 | Seconds_per_step --> 9.580 | 
[2024-10-21 01:12:28,480][Main][INFO] - [eval] Step 2500 out of 65536 | Loss --> 14.573 | Loss_ntp --> 7.286 | Loss_mlm --> 7.287 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.404 | 
[2024-10-21 01:16:30,064][Main][INFO] - [train] Step 2525 out of 65536 | Loss --> 14.280 | Loss_ntp --> 7.089 | Loss_mlm --> 7.192 | Grad_l2 --> 13.178 | Weights_l2 --> 7701.345 | Lr --> 0.001 | Seconds_per_step --> 9.663 | 
[2024-10-21 01:20:29,018][Main][INFO] - [train] Step 2550 out of 65536 | Loss --> 14.260 | Loss_ntp --> 7.091 | Loss_mlm --> 7.169 | Grad_l2 --> 12.381 | Weights_l2 --> 7701.341 | Lr --> 0.001 | Seconds_per_step --> 9.558 | 
[2024-10-21 01:24:31,253][Main][INFO] - [train] Step 2575 out of 65536 | Loss --> 14.259 | Loss_ntp --> 7.078 | Loss_mlm --> 7.182 | Grad_l2 --> 11.247 | Weights_l2 --> 7701.337 | Lr --> 0.001 | Seconds_per_step --> 9.689 | 
[2024-10-21 01:28:31,446][Main][INFO] - [train] Step 2600 out of 65536 | Loss --> 14.259 | Loss_ntp --> 7.080 | Loss_mlm --> 7.179 | Grad_l2 --> 12.524 | Weights_l2 --> 7701.333 | Lr --> 0.001 | Seconds_per_step --> 9.608 | 
[2024-10-21 01:32:31,794][Main][INFO] - [train] Step 2625 out of 65536 | Loss --> 14.245 | Loss_ntp --> 7.068 | Loss_mlm --> 7.178 | Grad_l2 --> 12.087 | Weights_l2 --> 7701.330 | Lr --> 0.001 | Seconds_per_step --> 9.614 | 
[2024-10-21 01:36:32,411][Main][INFO] - [train] Step 2650 out of 65536 | Loss --> 14.247 | Loss_ntp --> 7.074 | Loss_mlm --> 7.173 | Grad_l2 --> 11.638 | Weights_l2 --> 7701.326 | Lr --> 0.001 | Seconds_per_step --> 9.625 | 
[2024-10-21 01:40:33,462][Main][INFO] - [train] Step 2675 out of 65536 | Loss --> 14.274 | Loss_ntp --> 7.086 | Loss_mlm --> 7.189 | Grad_l2 --> 10.415 | Weights_l2 --> 7701.322 | Lr --> 0.001 | Seconds_per_step --> 9.642 | 
[2024-10-21 01:44:33,254][Main][INFO] - [train] Step 2700 out of 65536 | Loss --> 14.276 | Loss_ntp --> 7.097 | Loss_mlm --> 7.179 | Grad_l2 --> 10.830 | Weights_l2 --> 7701.318 | Lr --> 0.001 | Seconds_per_step --> 9.592 | 
[2024-10-21 01:48:34,104][Main][INFO] - [train] Step 2725 out of 65536 | Loss --> 14.322 | Loss_ntp --> 7.117 | Loss_mlm --> 7.205 | Grad_l2 --> 11.668 | Weights_l2 --> 7701.314 | Lr --> 0.001 | Seconds_per_step --> 9.634 | 
[2024-10-21 01:52:33,834][Main][INFO] - [train] Step 2750 out of 65536 | Loss --> 14.393 | Loss_ntp --> 7.149 | Loss_mlm --> 7.244 | Grad_l2 --> 10.585 | Weights_l2 --> 7701.310 | Lr --> 0.001 | Seconds_per_step --> 9.589 | 
[2024-10-21 01:56:33,130][Main][INFO] - [train] Step 2775 out of 65536 | Loss --> 14.326 | Loss_ntp --> 7.124 | Loss_mlm --> 7.202 | Grad_l2 --> 9.862 | Weights_l2 --> 7701.306 | Lr --> 0.001 | Seconds_per_step --> 9.572 | 
[2024-10-21 02:00:34,375][Main][INFO] - [train] Step 2800 out of 65536 | Loss --> 14.354 | Loss_ntp --> 7.134 | Loss_mlm --> 7.220 | Grad_l2 --> 8.484 | Weights_l2 --> 7701.302 | Lr --> 0.001 | Seconds_per_step --> 9.650 | 
[2024-10-21 02:04:34,763][Main][INFO] - [train] Step 2825 out of 65536 | Loss --> 14.320 | Loss_ntp --> 7.118 | Loss_mlm --> 7.202 | Grad_l2 --> 11.118 | Weights_l2 --> 7701.298 | Lr --> 0.001 | Seconds_per_step --> 9.615 | 
[2024-10-21 02:08:35,157][Main][INFO] - [train] Step 2850 out of 65536 | Loss --> 14.323 | Loss_ntp --> 7.124 | Loss_mlm --> 7.199 | Grad_l2 --> 10.821 | Weights_l2 --> 7701.294 | Lr --> 0.001 | Seconds_per_step --> 9.616 | 
[2024-10-21 02:12:34,860][Main][INFO] - [train] Step 2875 out of 65536 | Loss --> 14.348 | Loss_ntp --> 7.129 | Loss_mlm --> 7.219 | Grad_l2 --> 9.481 | Weights_l2 --> 7701.291 | Lr --> 0.001 | Seconds_per_step --> 9.588 | 
[2024-10-21 02:16:36,448][Main][INFO] - [train] Step 2900 out of 65536 | Loss --> 14.413 | Loss_ntp --> 7.163 | Loss_mlm --> 7.250 | Grad_l2 --> 10.586 | Weights_l2 --> 7701.287 | Lr --> 0.001 | Seconds_per_step --> 9.663 | 
[2024-10-21 02:20:36,563][Main][INFO] - [train] Step 2925 out of 65536 | Loss --> 14.319 | Loss_ntp --> 7.113 | Loss_mlm --> 7.206 | Grad_l2 --> 9.175 | Weights_l2 --> 7701.283 | Lr --> 0.001 | Seconds_per_step --> 9.604 | 
[2024-10-21 02:24:36,522][Main][INFO] - [train] Step 2950 out of 65536 | Loss --> 14.292 | Loss_ntp --> 7.112 | Loss_mlm --> 7.179 | Grad_l2 --> 10.380 | Weights_l2 --> 7701.279 | Lr --> 0.001 | Seconds_per_step --> 9.598 | 
[2024-10-21 02:28:36,510][Main][INFO] - [train] Step 2975 out of 65536 | Loss --> 14.202 | Loss_ntp --> 7.068 | Loss_mlm --> 7.134 | Grad_l2 --> 9.622 | Weights_l2 --> 7701.276 | Lr --> 0.001 | Seconds_per_step --> 9.599 | 
[2024-10-21 02:32:38,120][Main][INFO] - [train] Step 3000 out of 65536 | Loss --> 14.214 | Loss_ntp --> 7.066 | Loss_mlm --> 7.147 | Grad_l2 --> 10.228 | Weights_l2 --> 7701.272 | Lr --> 0.001 | Seconds_per_step --> 9.664 | 
[2024-10-21 02:33:06,984][Main][INFO] - [eval] Step 3000 out of 65536 | Loss --> 14.236 | Loss_ntp --> 7.111 | Loss_mlm --> 7.125 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.858 | 
[2024-10-21 02:33:06,988][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-3000
[2024-10-21 02:33:07,000][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
[2024-10-21 02:33:13,140][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-3000/model.safetensors
[2024-10-21 02:33:21,968][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-3000/optimizer.bin
[2024-10-21 02:33:21,978][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-3000/scheduler.bin
[2024-10-21 02:33:21,979][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-3000/sampler.bin
[2024-10-21 02:33:21,981][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-3000/sampler_1.bin
[2024-10-21 02:33:21,990][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-3000/random_states_0.pkl
[2024-10-21 02:37:21,949][Main][INFO] - [train] Step 3025 out of 65536 | Loss --> 14.180 | Loss_ntp --> 7.041 | Loss_mlm --> 7.138 | Grad_l2 --> 9.928 | Weights_l2 --> 7701.268 | Lr --> 0.001 | Seconds_per_step --> 10.198 | 
[2024-10-21 02:41:23,436][Main][INFO] - [train] Step 3050 out of 65536 | Loss --> 14.163 | Loss_ntp --> 7.032 | Loss_mlm --> 7.130 | Grad_l2 --> 9.909 | Weights_l2 --> 7701.264 | Lr --> 0.001 | Seconds_per_step --> 9.659 | 
[2024-10-21 02:45:23,362][Main][INFO] - [train] Step 3075 out of 65536 | Loss --> 14.109 | Loss_ntp --> 7.016 | Loss_mlm --> 7.093 | Grad_l2 --> 10.119 | Weights_l2 --> 7701.260 | Lr --> 0.001 | Seconds_per_step --> 9.597 | 
[2024-10-21 02:49:23,828][Main][INFO] - [train] Step 3100 out of 65536 | Loss --> 14.053 | Loss_ntp --> 6.981 | Loss_mlm --> 7.072 | Grad_l2 --> 8.917 | Weights_l2 --> 7701.256 | Lr --> 0.001 | Seconds_per_step --> 9.619 | 
[2024-10-21 02:53:26,144][Main][INFO] - [train] Step 3125 out of 65536 | Loss --> 14.045 | Loss_ntp --> 6.975 | Loss_mlm --> 7.069 | Grad_l2 --> 11.184 | Weights_l2 --> 7701.252 | Lr --> 0.001 | Seconds_per_step --> 9.692 | 
[2024-10-21 02:57:25,035][Main][INFO] - [train] Step 3150 out of 65536 | Loss --> 14.006 | Loss_ntp --> 6.959 | Loss_mlm --> 7.047 | Grad_l2 --> 9.280 | Weights_l2 --> 7701.248 | Lr --> 0.001 | Seconds_per_step --> 9.555 | 
[2024-10-21 03:01:27,283][Main][INFO] - [train] Step 3175 out of 65536 | Loss --> 13.943 | Loss_ntp --> 6.924 | Loss_mlm --> 7.020 | Grad_l2 --> 8.769 | Weights_l2 --> 7701.245 | Lr --> 0.001 | Seconds_per_step --> 9.690 | 
[2024-10-21 03:05:27,701][Main][INFO] - [train] Step 3200 out of 65536 | Loss --> 13.956 | Loss_ntp --> 6.916 | Loss_mlm --> 7.040 | Grad_l2 --> 8.625 | Weights_l2 --> 7701.241 | Lr --> 0.001 | Seconds_per_step --> 9.617 | 
[2024-10-21 03:09:28,530][Main][INFO] - [train] Step 3225 out of 65536 | Loss --> 13.916 | Loss_ntp --> 6.906 | Loss_mlm --> 7.010 | Grad_l2 --> 9.378 | Weights_l2 --> 7701.238 | Lr --> 0.001 | Seconds_per_step --> 9.633 | 
[2024-10-21 03:13:28,937][Main][INFO] - [train] Step 3250 out of 65536 | Loss --> 13.849 | Loss_ntp --> 6.867 | Loss_mlm --> 6.982 | Grad_l2 --> 9.221 | Weights_l2 --> 7701.234 | Lr --> 0.001 | Seconds_per_step --> 9.616 | 
[2024-10-21 03:17:29,597][Main][INFO] - [train] Step 3275 out of 65536 | Loss --> 13.854 | Loss_ntp --> 6.869 | Loss_mlm --> 6.985 | Grad_l2 --> 8.561 | Weights_l2 --> 7701.230 | Lr --> 0.001 | Seconds_per_step --> 9.626 | 
[2024-10-21 03:21:30,034][Main][INFO] - [train] Step 3300 out of 65536 | Loss --> 13.781 | Loss_ntp --> 6.843 | Loss_mlm --> 6.938 | Grad_l2 --> 8.919 | Weights_l2 --> 7701.226 | Lr --> 0.001 | Seconds_per_step --> 9.617 | 
[2024-10-21 03:25:29,815][Main][INFO] - [train] Step 3325 out of 65536 | Loss --> 13.766 | Loss_ntp --> 6.836 | Loss_mlm --> 6.930 | Grad_l2 --> 8.129 | Weights_l2 --> 7701.223 | Lr --> 0.001 | Seconds_per_step --> 9.591 | 
[2024-10-21 03:29:30,344][Main][INFO] - [train] Step 3350 out of 65536 | Loss --> 13.726 | Loss_ntp --> 6.809 | Loss_mlm --> 6.917 | Grad_l2 --> 9.145 | Weights_l2 --> 7701.219 | Lr --> 0.001 | Seconds_per_step --> 9.620 | 
[2024-10-21 03:33:30,171][Main][INFO] - [train] Step 3375 out of 65536 | Loss --> 13.751 | Loss_ntp --> 6.819 | Loss_mlm --> 6.932 | Grad_l2 --> 11.666 | Weights_l2 --> 7701.215 | Lr --> 0.001 | Seconds_per_step --> 9.593 | 
[2024-10-21 03:37:32,111][Main][INFO] - [train] Step 3400 out of 65536 | Loss --> 13.700 | Loss_ntp --> 6.796 | Loss_mlm --> 6.905 | Grad_l2 --> 8.776 | Weights_l2 --> 7701.211 | Lr --> 0.001 | Seconds_per_step --> 9.677 | 
[2024-10-21 03:41:31,530][Main][INFO] - [train] Step 3425 out of 65536 | Loss --> 13.641 | Loss_ntp --> 6.774 | Loss_mlm --> 6.868 | Grad_l2 --> 9.206 | Weights_l2 --> 7701.207 | Lr --> 0.001 | Seconds_per_step --> 9.577 | 
[2024-10-21 03:45:33,625][Main][INFO] - [train] Step 3450 out of 65536 | Loss --> 13.588 | Loss_ntp --> 6.735 | Loss_mlm --> 6.852 | Grad_l2 --> 6.293 | Weights_l2 --> 7701.204 | Lr --> 0.001 | Seconds_per_step --> 9.684 | 
[2024-10-21 03:49:34,400][Main][INFO] - [train] Step 3475 out of 65536 | Loss --> 13.615 | Loss_ntp --> 6.748 | Loss_mlm --> 6.868 | Grad_l2 --> 9.161 | Weights_l2 --> 7701.201 | Lr --> 0.001 | Seconds_per_step --> 9.631 | 
[2024-10-21 03:53:35,824][Main][INFO] - [train] Step 3500 out of 65536 | Loss --> 13.532 | Loss_ntp --> 6.707 | Loss_mlm --> 6.825 | Grad_l2 --> 9.556 | Weights_l2 --> 7701.197 | Lr --> 0.001 | Seconds_per_step --> 9.657 | 
[2024-10-21 03:54:04,713][Main][INFO] - [eval] Step 3500 out of 65536 | Loss --> 13.912 | Loss_ntp --> 6.950 | Loss_mlm --> 6.962 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.883 | 
[2024-10-21 03:58:05,620][Main][INFO] - [train] Step 3525 out of 65536 | Loss --> 13.463 | Loss_ntp --> 6.677 | Loss_mlm --> 6.786 | Grad_l2 --> 9.458 | Weights_l2 --> 7701.193 | Lr --> 0.001 | Seconds_per_step --> 9.636 | 
[2024-10-21 04:02:06,516][Main][INFO] - [train] Step 3550 out of 65536 | Loss --> 13.419 | Loss_ntp --> 6.654 | Loss_mlm --> 6.766 | Grad_l2 --> 9.819 | Weights_l2 --> 7701.188 | Lr --> 0.001 | Seconds_per_step --> 9.636 | 
[2024-10-21 04:06:07,229][Main][INFO] - [train] Step 3575 out of 65536 | Loss --> 13.362 | Loss_ntp --> 6.626 | Loss_mlm --> 6.736 | Grad_l2 --> 8.944 | Weights_l2 --> 7701.184 | Lr --> 0.001 | Seconds_per_step --> 9.628 | 
[2024-10-21 04:10:08,761][Main][INFO] - [train] Step 3600 out of 65536 | Loss --> 13.401 | Loss_ntp --> 6.628 | Loss_mlm --> 6.773 | Grad_l2 --> 9.904 | Weights_l2 --> 7701.180 | Lr --> 0.001 | Seconds_per_step --> 9.661 | 
[2024-10-21 04:14:09,815][Main][INFO] - [train] Step 3625 out of 65536 | Loss --> 13.361 | Loss_ntp --> 6.625 | Loss_mlm --> 6.736 | Grad_l2 --> 8.507 | Weights_l2 --> 7701.176 | Lr --> 0.001 | Seconds_per_step --> 9.642 | 
[2024-10-21 04:18:10,037][Main][INFO] - [train] Step 3650 out of 65536 | Loss --> 13.355 | Loss_ntp --> 6.614 | Loss_mlm --> 6.741 | Grad_l2 --> 9.056 | Weights_l2 --> 7701.172 | Lr --> 0.001 | Seconds_per_step --> 9.609 | 
[2024-10-21 04:22:10,677][Main][INFO] - [train] Step 3675 out of 65536 | Loss --> 13.306 | Loss_ntp --> 6.586 | Loss_mlm --> 6.720 | Grad_l2 --> 9.057 | Weights_l2 --> 7701.168 | Lr --> 0.001 | Seconds_per_step --> 9.625 | 
[2024-10-21 04:26:12,857][Main][INFO] - [train] Step 3700 out of 65536 | Loss --> 13.325 | Loss_ntp --> 6.596 | Loss_mlm --> 6.729 | Grad_l2 --> 10.732 | Weights_l2 --> 7701.163 | Lr --> 0.001 | Seconds_per_step --> 9.687 | 
[2024-10-21 04:30:11,816][Main][INFO] - [train] Step 3725 out of 65536 | Loss --> 13.239 | Loss_ntp --> 6.561 | Loss_mlm --> 6.678 | Grad_l2 --> 9.810 | Weights_l2 --> 7701.160 | Lr --> 0.001 | Seconds_per_step --> 9.558 | 
[2024-10-21 04:34:12,167][Main][INFO] - [train] Step 3750 out of 65536 | Loss --> 13.211 | Loss_ntp --> 6.534 | Loss_mlm --> 6.677 | Grad_l2 --> 10.011 | Weights_l2 --> 7701.156 | Lr --> 0.001 | Seconds_per_step --> 9.614 | 
[2024-10-21 04:38:14,046][Main][INFO] - [train] Step 3775 out of 65536 | Loss --> 13.214 | Loss_ntp --> 6.537 | Loss_mlm --> 6.678 | Grad_l2 --> 8.939 | Weights_l2 --> 7701.152 | Lr --> 0.001 | Seconds_per_step --> 9.675 | 
[2024-10-21 04:42:14,454][Main][INFO] - [train] Step 3800 out of 65536 | Loss --> 13.148 | Loss_ntp --> 6.508 | Loss_mlm --> 6.640 | Grad_l2 --> 9.513 | Weights_l2 --> 7701.148 | Lr --> 0.001 | Seconds_per_step --> 9.616 | 
[2024-10-21 04:46:14,554][Main][INFO] - [train] Step 3825 out of 65536 | Loss --> 13.172 | Loss_ntp --> 6.514 | Loss_mlm --> 6.658 | Grad_l2 --> 9.295 | Weights_l2 --> 7701.144 | Lr --> 0.001 | Seconds_per_step --> 9.604 | 
[2024-10-21 04:50:14,762][Main][INFO] - [train] Step 3850 out of 65536 | Loss --> 13.118 | Loss_ntp --> 6.494 | Loss_mlm --> 6.624 | Grad_l2 --> 7.890 | Weights_l2 --> 7701.140 | Lr --> 0.001 | Seconds_per_step --> 9.608 | 
[2024-10-21 04:54:16,032][Main][INFO] - [train] Step 3875 out of 65536 | Loss --> 13.179 | Loss_ntp --> 6.521 | Loss_mlm --> 6.657 | Grad_l2 --> 9.901 | Weights_l2 --> 7701.136 | Lr --> 0.001 | Seconds_per_step --> 9.651 | 
[2024-10-21 04:58:16,128][Main][INFO] - [train] Step 3900 out of 65536 | Loss --> 13.259 | Loss_ntp --> 6.571 | Loss_mlm --> 6.687 | Grad_l2 --> 8.910 | Weights_l2 --> 7701.132 | Lr --> 0.001 | Seconds_per_step --> 9.604 |