Abhi4 commited on
Commit
2d243fe
1 Parent(s): 75f9f5f

End of training

Browse files
Files changed (4) hide show
  1. all_results.json +13 -0
  2. eval_results.json +8 -0
  3. train_results.json +8 -0
  4. trainer_state.json +2191 -0
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_accuracy": 0.9615316328342309,
4
+ "eval_loss": 0.1066935583949089,
5
+ "eval_runtime": 200.7321,
6
+ "eval_samples_per_second": 84.177,
7
+ "eval_steps_per_second": 2.635,
8
+ "total_flos": 1.484110125228884e+19,
9
+ "train_loss": 0.2671308586192051,
10
+ "train_runtime": 11690.3858,
11
+ "train_samples_per_second": 39.025,
12
+ "train_steps_per_second": 0.305
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_accuracy": 0.9615316328342309,
4
+ "eval_loss": 0.1066935583949089,
5
+ "eval_runtime": 200.7321,
6
+ "eval_samples_per_second": 84.177,
7
+ "eval_steps_per_second": 2.635
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "total_flos": 1.484110125228884e+19,
4
+ "train_loss": 0.2671308586192051,
5
+ "train_runtime": 11690.3858,
6
+ "train_samples_per_second": 39.025,
7
+ "train_steps_per_second": 0.305
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9615316328342309,
3
+ "best_model_checkpoint": "swinv2-tiny-patch4-window8-256-finetuned-eurosat/checkpoint-3564",
4
+ "epoch": 2.9993688196928256,
5
+ "eval_steps": 500,
6
+ "global_step": 3564,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.01,
13
+ "learning_rate": 1.4005602240896359e-06,
14
+ "loss": 1.3358,
15
+ "step": 10
16
+ },
17
+ {
18
+ "epoch": 0.02,
19
+ "learning_rate": 2.8011204481792718e-06,
20
+ "loss": 1.2686,
21
+ "step": 20
22
+ },
23
+ {
24
+ "epoch": 0.03,
25
+ "learning_rate": 4.2016806722689085e-06,
26
+ "loss": 1.1951,
27
+ "step": 30
28
+ },
29
+ {
30
+ "epoch": 0.03,
31
+ "learning_rate": 5.6022408963585436e-06,
32
+ "loss": 1.0686,
33
+ "step": 40
34
+ },
35
+ {
36
+ "epoch": 0.04,
37
+ "learning_rate": 7.0028011204481795e-06,
38
+ "loss": 1.0255,
39
+ "step": 50
40
+ },
41
+ {
42
+ "epoch": 0.05,
43
+ "learning_rate": 8.403361344537817e-06,
44
+ "loss": 0.9536,
45
+ "step": 60
46
+ },
47
+ {
48
+ "epoch": 0.06,
49
+ "learning_rate": 9.803921568627451e-06,
50
+ "loss": 0.8115,
51
+ "step": 70
52
+ },
53
+ {
54
+ "epoch": 0.07,
55
+ "learning_rate": 1.1204481792717087e-05,
56
+ "loss": 0.735,
57
+ "step": 80
58
+ },
59
+ {
60
+ "epoch": 0.08,
61
+ "learning_rate": 1.2605042016806723e-05,
62
+ "loss": 0.666,
63
+ "step": 90
64
+ },
65
+ {
66
+ "epoch": 0.08,
67
+ "learning_rate": 1.4005602240896359e-05,
68
+ "loss": 0.6396,
69
+ "step": 100
70
+ },
71
+ {
72
+ "epoch": 0.09,
73
+ "learning_rate": 1.5406162464985995e-05,
74
+ "loss": 0.5814,
75
+ "step": 110
76
+ },
77
+ {
78
+ "epoch": 0.1,
79
+ "learning_rate": 1.6806722689075634e-05,
80
+ "loss": 0.5461,
81
+ "step": 120
82
+ },
83
+ {
84
+ "epoch": 0.11,
85
+ "learning_rate": 1.8207282913165267e-05,
86
+ "loss": 0.5389,
87
+ "step": 130
88
+ },
89
+ {
90
+ "epoch": 0.12,
91
+ "learning_rate": 1.9607843137254903e-05,
92
+ "loss": 0.4734,
93
+ "step": 140
94
+ },
95
+ {
96
+ "epoch": 0.13,
97
+ "learning_rate": 2.100840336134454e-05,
98
+ "loss": 0.4689,
99
+ "step": 150
100
+ },
101
+ {
102
+ "epoch": 0.13,
103
+ "learning_rate": 2.2408963585434174e-05,
104
+ "loss": 0.4593,
105
+ "step": 160
106
+ },
107
+ {
108
+ "epoch": 0.14,
109
+ "learning_rate": 2.380952380952381e-05,
110
+ "loss": 0.4205,
111
+ "step": 170
112
+ },
113
+ {
114
+ "epoch": 0.15,
115
+ "learning_rate": 2.5210084033613446e-05,
116
+ "loss": 0.4288,
117
+ "step": 180
118
+ },
119
+ {
120
+ "epoch": 0.16,
121
+ "learning_rate": 2.6610644257703082e-05,
122
+ "loss": 0.4006,
123
+ "step": 190
124
+ },
125
+ {
126
+ "epoch": 0.17,
127
+ "learning_rate": 2.8011204481792718e-05,
128
+ "loss": 0.4362,
129
+ "step": 200
130
+ },
131
+ {
132
+ "epoch": 0.18,
133
+ "learning_rate": 2.9411764705882354e-05,
134
+ "loss": 0.3418,
135
+ "step": 210
136
+ },
137
+ {
138
+ "epoch": 0.19,
139
+ "learning_rate": 3.081232492997199e-05,
140
+ "loss": 0.3842,
141
+ "step": 220
142
+ },
143
+ {
144
+ "epoch": 0.19,
145
+ "learning_rate": 3.221288515406163e-05,
146
+ "loss": 0.3478,
147
+ "step": 230
148
+ },
149
+ {
150
+ "epoch": 0.2,
151
+ "learning_rate": 3.361344537815127e-05,
152
+ "loss": 0.3627,
153
+ "step": 240
154
+ },
155
+ {
156
+ "epoch": 0.21,
157
+ "learning_rate": 3.5014005602240894e-05,
158
+ "loss": 0.3705,
159
+ "step": 250
160
+ },
161
+ {
162
+ "epoch": 0.22,
163
+ "learning_rate": 3.641456582633053e-05,
164
+ "loss": 0.3953,
165
+ "step": 260
166
+ },
167
+ {
168
+ "epoch": 0.23,
169
+ "learning_rate": 3.7815126050420166e-05,
170
+ "loss": 0.38,
171
+ "step": 270
172
+ },
173
+ {
174
+ "epoch": 0.24,
175
+ "learning_rate": 3.9215686274509805e-05,
176
+ "loss": 0.3714,
177
+ "step": 280
178
+ },
179
+ {
180
+ "epoch": 0.24,
181
+ "learning_rate": 4.0616246498599444e-05,
182
+ "loss": 0.3445,
183
+ "step": 290
184
+ },
185
+ {
186
+ "epoch": 0.25,
187
+ "learning_rate": 4.201680672268908e-05,
188
+ "loss": 0.3509,
189
+ "step": 300
190
+ },
191
+ {
192
+ "epoch": 0.26,
193
+ "learning_rate": 4.3417366946778716e-05,
194
+ "loss": 0.3525,
195
+ "step": 310
196
+ },
197
+ {
198
+ "epoch": 0.27,
199
+ "learning_rate": 4.481792717086835e-05,
200
+ "loss": 0.3626,
201
+ "step": 320
202
+ },
203
+ {
204
+ "epoch": 0.28,
205
+ "learning_rate": 4.621848739495799e-05,
206
+ "loss": 0.3336,
207
+ "step": 330
208
+ },
209
+ {
210
+ "epoch": 0.29,
211
+ "learning_rate": 4.761904761904762e-05,
212
+ "loss": 0.3407,
213
+ "step": 340
214
+ },
215
+ {
216
+ "epoch": 0.29,
217
+ "learning_rate": 4.901960784313725e-05,
218
+ "loss": 0.3259,
219
+ "step": 350
220
+ },
221
+ {
222
+ "epoch": 0.3,
223
+ "learning_rate": 4.99532273152479e-05,
224
+ "loss": 0.3566,
225
+ "step": 360
226
+ },
227
+ {
228
+ "epoch": 0.31,
229
+ "learning_rate": 4.979731836607422e-05,
230
+ "loss": 0.3209,
231
+ "step": 370
232
+ },
233
+ {
234
+ "epoch": 0.32,
235
+ "learning_rate": 4.964140941690053e-05,
236
+ "loss": 0.3593,
237
+ "step": 380
238
+ },
239
+ {
240
+ "epoch": 0.33,
241
+ "learning_rate": 4.948550046772685e-05,
242
+ "loss": 0.3349,
243
+ "step": 390
244
+ },
245
+ {
246
+ "epoch": 0.34,
247
+ "learning_rate": 4.932959151855317e-05,
248
+ "loss": 0.3043,
249
+ "step": 400
250
+ },
251
+ {
252
+ "epoch": 0.35,
253
+ "learning_rate": 4.917368256937949e-05,
254
+ "loss": 0.3387,
255
+ "step": 410
256
+ },
257
+ {
258
+ "epoch": 0.35,
259
+ "learning_rate": 4.90177736202058e-05,
260
+ "loss": 0.3448,
261
+ "step": 420
262
+ },
263
+ {
264
+ "epoch": 0.36,
265
+ "learning_rate": 4.886186467103212e-05,
266
+ "loss": 0.3246,
267
+ "step": 430
268
+ },
269
+ {
270
+ "epoch": 0.37,
271
+ "learning_rate": 4.870595572185844e-05,
272
+ "loss": 0.3293,
273
+ "step": 440
274
+ },
275
+ {
276
+ "epoch": 0.38,
277
+ "learning_rate": 4.855004677268475e-05,
278
+ "loss": 0.3269,
279
+ "step": 450
280
+ },
281
+ {
282
+ "epoch": 0.39,
283
+ "learning_rate": 4.839413782351107e-05,
284
+ "loss": 0.2847,
285
+ "step": 460
286
+ },
287
+ {
288
+ "epoch": 0.4,
289
+ "learning_rate": 4.8238228874337385e-05,
290
+ "loss": 0.2986,
291
+ "step": 470
292
+ },
293
+ {
294
+ "epoch": 0.4,
295
+ "learning_rate": 4.8082319925163704e-05,
296
+ "loss": 0.3196,
297
+ "step": 480
298
+ },
299
+ {
300
+ "epoch": 0.41,
301
+ "learning_rate": 4.7926410975990024e-05,
302
+ "loss": 0.3191,
303
+ "step": 490
304
+ },
305
+ {
306
+ "epoch": 0.42,
307
+ "learning_rate": 4.777050202681634e-05,
308
+ "loss": 0.2793,
309
+ "step": 500
310
+ },
311
+ {
312
+ "epoch": 0.43,
313
+ "learning_rate": 4.761459307764266e-05,
314
+ "loss": 0.2967,
315
+ "step": 510
316
+ },
317
+ {
318
+ "epoch": 0.44,
319
+ "learning_rate": 4.7458684128468975e-05,
320
+ "loss": 0.3299,
321
+ "step": 520
322
+ },
323
+ {
324
+ "epoch": 0.45,
325
+ "learning_rate": 4.7302775179295295e-05,
326
+ "loss": 0.2825,
327
+ "step": 530
328
+ },
329
+ {
330
+ "epoch": 0.45,
331
+ "learning_rate": 4.7146866230121614e-05,
332
+ "loss": 0.3363,
333
+ "step": 540
334
+ },
335
+ {
336
+ "epoch": 0.46,
337
+ "learning_rate": 4.699095728094793e-05,
338
+ "loss": 0.2974,
339
+ "step": 550
340
+ },
341
+ {
342
+ "epoch": 0.47,
343
+ "learning_rate": 4.6835048331774246e-05,
344
+ "loss": 0.2847,
345
+ "step": 560
346
+ },
347
+ {
348
+ "epoch": 0.48,
349
+ "learning_rate": 4.667913938260056e-05,
350
+ "loss": 0.2943,
351
+ "step": 570
352
+ },
353
+ {
354
+ "epoch": 0.49,
355
+ "learning_rate": 4.652323043342688e-05,
356
+ "loss": 0.2425,
357
+ "step": 580
358
+ },
359
+ {
360
+ "epoch": 0.5,
361
+ "learning_rate": 4.63673214842532e-05,
362
+ "loss": 0.3261,
363
+ "step": 590
364
+ },
365
+ {
366
+ "epoch": 0.5,
367
+ "learning_rate": 4.621141253507952e-05,
368
+ "loss": 0.2602,
369
+ "step": 600
370
+ },
371
+ {
372
+ "epoch": 0.51,
373
+ "learning_rate": 4.605550358590584e-05,
374
+ "loss": 0.2779,
375
+ "step": 610
376
+ },
377
+ {
378
+ "epoch": 0.52,
379
+ "learning_rate": 4.589959463673215e-05,
380
+ "loss": 0.2807,
381
+ "step": 620
382
+ },
383
+ {
384
+ "epoch": 0.53,
385
+ "learning_rate": 4.574368568755847e-05,
386
+ "loss": 0.2705,
387
+ "step": 630
388
+ },
389
+ {
390
+ "epoch": 0.54,
391
+ "learning_rate": 4.558777673838479e-05,
392
+ "loss": 0.2932,
393
+ "step": 640
394
+ },
395
+ {
396
+ "epoch": 0.55,
397
+ "learning_rate": 4.54318677892111e-05,
398
+ "loss": 0.2662,
399
+ "step": 650
400
+ },
401
+ {
402
+ "epoch": 0.56,
403
+ "learning_rate": 4.527595884003742e-05,
404
+ "loss": 0.2452,
405
+ "step": 660
406
+ },
407
+ {
408
+ "epoch": 0.56,
409
+ "learning_rate": 4.512004989086373e-05,
410
+ "loss": 0.2549,
411
+ "step": 670
412
+ },
413
+ {
414
+ "epoch": 0.57,
415
+ "learning_rate": 4.496414094169005e-05,
416
+ "loss": 0.2792,
417
+ "step": 680
418
+ },
419
+ {
420
+ "epoch": 0.58,
421
+ "learning_rate": 4.480823199251637e-05,
422
+ "loss": 0.2771,
423
+ "step": 690
424
+ },
425
+ {
426
+ "epoch": 0.59,
427
+ "learning_rate": 4.465232304334269e-05,
428
+ "loss": 0.296,
429
+ "step": 700
430
+ },
431
+ {
432
+ "epoch": 0.6,
433
+ "learning_rate": 4.449641409416901e-05,
434
+ "loss": 0.2566,
435
+ "step": 710
436
+ },
437
+ {
438
+ "epoch": 0.61,
439
+ "learning_rate": 4.4340505144995324e-05,
440
+ "loss": 0.293,
441
+ "step": 720
442
+ },
443
+ {
444
+ "epoch": 0.61,
445
+ "learning_rate": 4.4184596195821643e-05,
446
+ "loss": 0.2751,
447
+ "step": 730
448
+ },
449
+ {
450
+ "epoch": 0.62,
451
+ "learning_rate": 4.4028687246647956e-05,
452
+ "loss": 0.2603,
453
+ "step": 740
454
+ },
455
+ {
456
+ "epoch": 0.63,
457
+ "learning_rate": 4.3872778297474276e-05,
458
+ "loss": 0.2984,
459
+ "step": 750
460
+ },
461
+ {
462
+ "epoch": 0.64,
463
+ "learning_rate": 4.3716869348300595e-05,
464
+ "loss": 0.2358,
465
+ "step": 760
466
+ },
467
+ {
468
+ "epoch": 0.65,
469
+ "learning_rate": 4.356096039912691e-05,
470
+ "loss": 0.2454,
471
+ "step": 770
472
+ },
473
+ {
474
+ "epoch": 0.66,
475
+ "learning_rate": 4.340505144995323e-05,
476
+ "loss": 0.2366,
477
+ "step": 780
478
+ },
479
+ {
480
+ "epoch": 0.66,
481
+ "learning_rate": 4.324914250077955e-05,
482
+ "loss": 0.2831,
483
+ "step": 790
484
+ },
485
+ {
486
+ "epoch": 0.67,
487
+ "learning_rate": 4.3093233551605866e-05,
488
+ "loss": 0.3191,
489
+ "step": 800
490
+ },
491
+ {
492
+ "epoch": 0.68,
493
+ "learning_rate": 4.2937324602432186e-05,
494
+ "loss": 0.2633,
495
+ "step": 810
496
+ },
497
+ {
498
+ "epoch": 0.69,
499
+ "learning_rate": 4.27814156532585e-05,
500
+ "loss": 0.2857,
501
+ "step": 820
502
+ },
503
+ {
504
+ "epoch": 0.7,
505
+ "learning_rate": 4.262550670408482e-05,
506
+ "loss": 0.2503,
507
+ "step": 830
508
+ },
509
+ {
510
+ "epoch": 0.71,
511
+ "learning_rate": 4.246959775491113e-05,
512
+ "loss": 0.2761,
513
+ "step": 840
514
+ },
515
+ {
516
+ "epoch": 0.72,
517
+ "learning_rate": 4.231368880573745e-05,
518
+ "loss": 0.2761,
519
+ "step": 850
520
+ },
521
+ {
522
+ "epoch": 0.72,
523
+ "learning_rate": 4.215777985656377e-05,
524
+ "loss": 0.2628,
525
+ "step": 860
526
+ },
527
+ {
528
+ "epoch": 0.73,
529
+ "learning_rate": 4.200187090739008e-05,
530
+ "loss": 0.2573,
531
+ "step": 870
532
+ },
533
+ {
534
+ "epoch": 0.74,
535
+ "learning_rate": 4.18459619582164e-05,
536
+ "loss": 0.2736,
537
+ "step": 880
538
+ },
539
+ {
540
+ "epoch": 0.75,
541
+ "learning_rate": 4.169005300904272e-05,
542
+ "loss": 0.2847,
543
+ "step": 890
544
+ },
545
+ {
546
+ "epoch": 0.76,
547
+ "learning_rate": 4.153414405986904e-05,
548
+ "loss": 0.2567,
549
+ "step": 900
550
+ },
551
+ {
552
+ "epoch": 0.77,
553
+ "learning_rate": 4.137823511069536e-05,
554
+ "loss": 0.256,
555
+ "step": 910
556
+ },
557
+ {
558
+ "epoch": 0.77,
559
+ "learning_rate": 4.122232616152167e-05,
560
+ "loss": 0.2705,
561
+ "step": 920
562
+ },
563
+ {
564
+ "epoch": 0.78,
565
+ "learning_rate": 4.106641721234799e-05,
566
+ "loss": 0.2551,
567
+ "step": 930
568
+ },
569
+ {
570
+ "epoch": 0.79,
571
+ "learning_rate": 4.0910508263174305e-05,
572
+ "loss": 0.2679,
573
+ "step": 940
574
+ },
575
+ {
576
+ "epoch": 0.8,
577
+ "learning_rate": 4.0754599314000624e-05,
578
+ "loss": 0.2729,
579
+ "step": 950
580
+ },
581
+ {
582
+ "epoch": 0.81,
583
+ "learning_rate": 4.0598690364826944e-05,
584
+ "loss": 0.2574,
585
+ "step": 960
586
+ },
587
+ {
588
+ "epoch": 0.82,
589
+ "learning_rate": 4.0442781415653257e-05,
590
+ "loss": 0.2625,
591
+ "step": 970
592
+ },
593
+ {
594
+ "epoch": 0.82,
595
+ "learning_rate": 4.028687246647958e-05,
596
+ "loss": 0.2673,
597
+ "step": 980
598
+ },
599
+ {
600
+ "epoch": 0.83,
601
+ "learning_rate": 4.0130963517305895e-05,
602
+ "loss": 0.2618,
603
+ "step": 990
604
+ },
605
+ {
606
+ "epoch": 0.84,
607
+ "learning_rate": 3.9975054568132215e-05,
608
+ "loss": 0.2675,
609
+ "step": 1000
610
+ },
611
+ {
612
+ "epoch": 0.85,
613
+ "learning_rate": 3.9819145618958534e-05,
614
+ "loss": 0.2439,
615
+ "step": 1010
616
+ },
617
+ {
618
+ "epoch": 0.86,
619
+ "learning_rate": 3.966323666978485e-05,
620
+ "loss": 0.2537,
621
+ "step": 1020
622
+ },
623
+ {
624
+ "epoch": 0.87,
625
+ "learning_rate": 3.9507327720611167e-05,
626
+ "loss": 0.2383,
627
+ "step": 1030
628
+ },
629
+ {
630
+ "epoch": 0.88,
631
+ "learning_rate": 3.935141877143748e-05,
632
+ "loss": 0.2545,
633
+ "step": 1040
634
+ },
635
+ {
636
+ "epoch": 0.88,
637
+ "learning_rate": 3.91955098222638e-05,
638
+ "loss": 0.2303,
639
+ "step": 1050
640
+ },
641
+ {
642
+ "epoch": 0.89,
643
+ "learning_rate": 3.903960087309012e-05,
644
+ "loss": 0.2312,
645
+ "step": 1060
646
+ },
647
+ {
648
+ "epoch": 0.9,
649
+ "learning_rate": 3.888369192391643e-05,
650
+ "loss": 0.2846,
651
+ "step": 1070
652
+ },
653
+ {
654
+ "epoch": 0.91,
655
+ "learning_rate": 3.872778297474276e-05,
656
+ "loss": 0.2839,
657
+ "step": 1080
658
+ },
659
+ {
660
+ "epoch": 0.92,
661
+ "learning_rate": 3.857187402556907e-05,
662
+ "loss": 0.2489,
663
+ "step": 1090
664
+ },
665
+ {
666
+ "epoch": 0.93,
667
+ "learning_rate": 3.841596507639539e-05,
668
+ "loss": 0.2635,
669
+ "step": 1100
670
+ },
671
+ {
672
+ "epoch": 0.93,
673
+ "learning_rate": 3.82600561272217e-05,
674
+ "loss": 0.2591,
675
+ "step": 1110
676
+ },
677
+ {
678
+ "epoch": 0.94,
679
+ "learning_rate": 3.810414717804802e-05,
680
+ "loss": 0.2256,
681
+ "step": 1120
682
+ },
683
+ {
684
+ "epoch": 0.95,
685
+ "learning_rate": 3.794823822887434e-05,
686
+ "loss": 0.2216,
687
+ "step": 1130
688
+ },
689
+ {
690
+ "epoch": 0.96,
691
+ "learning_rate": 3.7792329279700654e-05,
692
+ "loss": 0.2441,
693
+ "step": 1140
694
+ },
695
+ {
696
+ "epoch": 0.97,
697
+ "learning_rate": 3.763642033052697e-05,
698
+ "loss": 0.2335,
699
+ "step": 1150
700
+ },
701
+ {
702
+ "epoch": 0.98,
703
+ "learning_rate": 3.748051138135329e-05,
704
+ "loss": 0.2729,
705
+ "step": 1160
706
+ },
707
+ {
708
+ "epoch": 0.98,
709
+ "learning_rate": 3.7324602432179605e-05,
710
+ "loss": 0.2336,
711
+ "step": 1170
712
+ },
713
+ {
714
+ "epoch": 0.99,
715
+ "learning_rate": 3.716869348300593e-05,
716
+ "loss": 0.2183,
717
+ "step": 1180
718
+ },
719
+ {
720
+ "epoch": 1.0,
721
+ "eval_accuracy": 0.9541930520210689,
722
+ "eval_loss": 0.12830650806427002,
723
+ "eval_runtime": 310.446,
724
+ "eval_samples_per_second": 54.428,
725
+ "eval_steps_per_second": 1.704,
726
+ "step": 1188
727
+ },
728
+ {
729
+ "epoch": 1.0,
730
+ "learning_rate": 3.7012784533832244e-05,
731
+ "loss": 0.2144,
732
+ "step": 1190
733
+ },
734
+ {
735
+ "epoch": 1.01,
736
+ "learning_rate": 3.6856875584658564e-05,
737
+ "loss": 0.2249,
738
+ "step": 1200
739
+ },
740
+ {
741
+ "epoch": 1.02,
742
+ "learning_rate": 3.6700966635484876e-05,
743
+ "loss": 0.2721,
744
+ "step": 1210
745
+ },
746
+ {
747
+ "epoch": 1.03,
748
+ "learning_rate": 3.6545057686311196e-05,
749
+ "loss": 0.2508,
750
+ "step": 1220
751
+ },
752
+ {
753
+ "epoch": 1.04,
754
+ "learning_rate": 3.6389148737137515e-05,
755
+ "loss": 0.2781,
756
+ "step": 1230
757
+ },
758
+ {
759
+ "epoch": 1.04,
760
+ "learning_rate": 3.623323978796383e-05,
761
+ "loss": 0.233,
762
+ "step": 1240
763
+ },
764
+ {
765
+ "epoch": 1.05,
766
+ "learning_rate": 3.607733083879015e-05,
767
+ "loss": 0.2861,
768
+ "step": 1250
769
+ },
770
+ {
771
+ "epoch": 1.06,
772
+ "learning_rate": 3.592142188961646e-05,
773
+ "loss": 0.2512,
774
+ "step": 1260
775
+ },
776
+ {
777
+ "epoch": 1.07,
778
+ "learning_rate": 3.5765512940442786e-05,
779
+ "loss": 0.2532,
780
+ "step": 1270
781
+ },
782
+ {
783
+ "epoch": 1.08,
784
+ "learning_rate": 3.5609603991269106e-05,
785
+ "loss": 0.2321,
786
+ "step": 1280
787
+ },
788
+ {
789
+ "epoch": 1.09,
790
+ "learning_rate": 3.545369504209542e-05,
791
+ "loss": 0.2722,
792
+ "step": 1290
793
+ },
794
+ {
795
+ "epoch": 1.09,
796
+ "learning_rate": 3.529778609292174e-05,
797
+ "loss": 0.2788,
798
+ "step": 1300
799
+ },
800
+ {
801
+ "epoch": 1.1,
802
+ "learning_rate": 3.514187714374805e-05,
803
+ "loss": 0.2427,
804
+ "step": 1310
805
+ },
806
+ {
807
+ "epoch": 1.11,
808
+ "learning_rate": 3.498596819457437e-05,
809
+ "loss": 0.2556,
810
+ "step": 1320
811
+ },
812
+ {
813
+ "epoch": 1.12,
814
+ "learning_rate": 3.483005924540069e-05,
815
+ "loss": 0.2325,
816
+ "step": 1330
817
+ },
818
+ {
819
+ "epoch": 1.13,
820
+ "learning_rate": 3.4674150296227e-05,
821
+ "loss": 0.2431,
822
+ "step": 1340
823
+ },
824
+ {
825
+ "epoch": 1.14,
826
+ "learning_rate": 3.451824134705332e-05,
827
+ "loss": 0.2179,
828
+ "step": 1350
829
+ },
830
+ {
831
+ "epoch": 1.14,
832
+ "learning_rate": 3.4362332397879635e-05,
833
+ "loss": 0.2631,
834
+ "step": 1360
835
+ },
836
+ {
837
+ "epoch": 1.15,
838
+ "learning_rate": 3.420642344870596e-05,
839
+ "loss": 0.2296,
840
+ "step": 1370
841
+ },
842
+ {
843
+ "epoch": 1.16,
844
+ "learning_rate": 3.4050514499532273e-05,
845
+ "loss": 0.2425,
846
+ "step": 1380
847
+ },
848
+ {
849
+ "epoch": 1.17,
850
+ "learning_rate": 3.389460555035859e-05,
851
+ "loss": 0.2098,
852
+ "step": 1390
853
+ },
854
+ {
855
+ "epoch": 1.18,
856
+ "learning_rate": 3.373869660118491e-05,
857
+ "loss": 0.2488,
858
+ "step": 1400
859
+ },
860
+ {
861
+ "epoch": 1.19,
862
+ "learning_rate": 3.3582787652011225e-05,
863
+ "loss": 0.2532,
864
+ "step": 1410
865
+ },
866
+ {
867
+ "epoch": 1.2,
868
+ "learning_rate": 3.3426878702837545e-05,
869
+ "loss": 0.2135,
870
+ "step": 1420
871
+ },
872
+ {
873
+ "epoch": 1.2,
874
+ "learning_rate": 3.3270969753663864e-05,
875
+ "loss": 0.2263,
876
+ "step": 1430
877
+ },
878
+ {
879
+ "epoch": 1.21,
880
+ "learning_rate": 3.311506080449018e-05,
881
+ "loss": 0.2859,
882
+ "step": 1440
883
+ },
884
+ {
885
+ "epoch": 1.22,
886
+ "learning_rate": 3.2959151855316496e-05,
887
+ "loss": 0.2486,
888
+ "step": 1450
889
+ },
890
+ {
891
+ "epoch": 1.23,
892
+ "learning_rate": 3.280324290614281e-05,
893
+ "loss": 0.264,
894
+ "step": 1460
895
+ },
896
+ {
897
+ "epoch": 1.24,
898
+ "learning_rate": 3.2647333956969135e-05,
899
+ "loss": 0.2348,
900
+ "step": 1470
901
+ },
902
+ {
903
+ "epoch": 1.25,
904
+ "learning_rate": 3.249142500779545e-05,
905
+ "loss": 0.2529,
906
+ "step": 1480
907
+ },
908
+ {
909
+ "epoch": 1.25,
910
+ "learning_rate": 3.233551605862177e-05,
911
+ "loss": 0.2424,
912
+ "step": 1490
913
+ },
914
+ {
915
+ "epoch": 1.26,
916
+ "learning_rate": 3.217960710944809e-05,
917
+ "loss": 0.215,
918
+ "step": 1500
919
+ },
920
+ {
921
+ "epoch": 1.27,
922
+ "learning_rate": 3.20236981602744e-05,
923
+ "loss": 0.242,
924
+ "step": 1510
925
+ },
926
+ {
927
+ "epoch": 1.28,
928
+ "learning_rate": 3.186778921110072e-05,
929
+ "loss": 0.2328,
930
+ "step": 1520
931
+ },
932
+ {
933
+ "epoch": 1.29,
934
+ "learning_rate": 3.171188026192703e-05,
935
+ "loss": 0.2489,
936
+ "step": 1530
937
+ },
938
+ {
939
+ "epoch": 1.3,
940
+ "learning_rate": 3.155597131275335e-05,
941
+ "loss": 0.213,
942
+ "step": 1540
943
+ },
944
+ {
945
+ "epoch": 1.3,
946
+ "learning_rate": 3.140006236357967e-05,
947
+ "loss": 0.2347,
948
+ "step": 1550
949
+ },
950
+ {
951
+ "epoch": 1.31,
952
+ "learning_rate": 3.124415341440599e-05,
953
+ "loss": 0.2158,
954
+ "step": 1560
955
+ },
956
+ {
957
+ "epoch": 1.32,
958
+ "learning_rate": 3.108824446523231e-05,
959
+ "loss": 0.2464,
960
+ "step": 1570
961
+ },
962
+ {
963
+ "epoch": 1.33,
964
+ "learning_rate": 3.093233551605862e-05,
965
+ "loss": 0.2185,
966
+ "step": 1580
967
+ },
968
+ {
969
+ "epoch": 1.34,
970
+ "learning_rate": 3.077642656688494e-05,
971
+ "loss": 0.2299,
972
+ "step": 1590
973
+ },
974
+ {
975
+ "epoch": 1.35,
976
+ "learning_rate": 3.062051761771126e-05,
977
+ "loss": 0.2343,
978
+ "step": 1600
979
+ },
980
+ {
981
+ "epoch": 1.35,
982
+ "learning_rate": 3.0464608668537574e-05,
983
+ "loss": 0.2384,
984
+ "step": 1610
985
+ },
986
+ {
987
+ "epoch": 1.36,
988
+ "learning_rate": 3.0308699719363893e-05,
989
+ "loss": 0.2382,
990
+ "step": 1620
991
+ },
992
+ {
993
+ "epoch": 1.37,
994
+ "learning_rate": 3.015279077019021e-05,
995
+ "loss": 0.21,
996
+ "step": 1630
997
+ },
998
+ {
999
+ "epoch": 1.38,
1000
+ "learning_rate": 2.9996881821016526e-05,
1001
+ "loss": 0.2803,
1002
+ "step": 1640
1003
+ },
1004
+ {
1005
+ "epoch": 1.39,
1006
+ "learning_rate": 2.984097287184284e-05,
1007
+ "loss": 0.2284,
1008
+ "step": 1650
1009
+ },
1010
+ {
1011
+ "epoch": 1.4,
1012
+ "learning_rate": 2.9685063922669164e-05,
1013
+ "loss": 0.223,
1014
+ "step": 1660
1015
+ },
1016
+ {
1017
+ "epoch": 1.41,
1018
+ "learning_rate": 2.952915497349548e-05,
1019
+ "loss": 0.2582,
1020
+ "step": 1670
1021
+ },
1022
+ {
1023
+ "epoch": 1.41,
1024
+ "learning_rate": 2.93732460243218e-05,
1025
+ "loss": 0.2435,
1026
+ "step": 1680
1027
+ },
1028
+ {
1029
+ "epoch": 1.42,
1030
+ "learning_rate": 2.9217337075148116e-05,
1031
+ "loss": 0.2211,
1032
+ "step": 1690
1033
+ },
1034
+ {
1035
+ "epoch": 1.43,
1036
+ "learning_rate": 2.9061428125974432e-05,
1037
+ "loss": 0.2541,
1038
+ "step": 1700
1039
+ },
1040
+ {
1041
+ "epoch": 1.44,
1042
+ "learning_rate": 2.8905519176800748e-05,
1043
+ "loss": 0.2283,
1044
+ "step": 1710
1045
+ },
1046
+ {
1047
+ "epoch": 1.45,
1048
+ "learning_rate": 2.8749610227627068e-05,
1049
+ "loss": 0.2357,
1050
+ "step": 1720
1051
+ },
1052
+ {
1053
+ "epoch": 1.46,
1054
+ "learning_rate": 2.8593701278453384e-05,
1055
+ "loss": 0.2264,
1056
+ "step": 1730
1057
+ },
1058
+ {
1059
+ "epoch": 1.46,
1060
+ "learning_rate": 2.84377923292797e-05,
1061
+ "loss": 0.2504,
1062
+ "step": 1740
1063
+ },
1064
+ {
1065
+ "epoch": 1.47,
1066
+ "learning_rate": 2.8281883380106016e-05,
1067
+ "loss": 0.2338,
1068
+ "step": 1750
1069
+ },
1070
+ {
1071
+ "epoch": 1.48,
1072
+ "learning_rate": 2.812597443093234e-05,
1073
+ "loss": 0.2298,
1074
+ "step": 1760
1075
+ },
1076
+ {
1077
+ "epoch": 1.49,
1078
+ "learning_rate": 2.7970065481758655e-05,
1079
+ "loss": 0.2282,
1080
+ "step": 1770
1081
+ },
1082
+ {
1083
+ "epoch": 1.5,
1084
+ "learning_rate": 2.7814156532584974e-05,
1085
+ "loss": 0.2402,
1086
+ "step": 1780
1087
+ },
1088
+ {
1089
+ "epoch": 1.51,
1090
+ "learning_rate": 2.765824758341129e-05,
1091
+ "loss": 0.2186,
1092
+ "step": 1790
1093
+ },
1094
+ {
1095
+ "epoch": 1.51,
1096
+ "learning_rate": 2.7502338634237607e-05,
1097
+ "loss": 0.2131,
1098
+ "step": 1800
1099
+ },
1100
+ {
1101
+ "epoch": 1.52,
1102
+ "learning_rate": 2.7346429685063923e-05,
1103
+ "loss": 0.2254,
1104
+ "step": 1810
1105
+ },
1106
+ {
1107
+ "epoch": 1.53,
1108
+ "learning_rate": 2.719052073589024e-05,
1109
+ "loss": 0.2042,
1110
+ "step": 1820
1111
+ },
1112
+ {
1113
+ "epoch": 1.54,
1114
+ "learning_rate": 2.7034611786716558e-05,
1115
+ "loss": 0.2537,
1116
+ "step": 1830
1117
+ },
1118
+ {
1119
+ "epoch": 1.55,
1120
+ "learning_rate": 2.6878702837542874e-05,
1121
+ "loss": 0.2386,
1122
+ "step": 1840
1123
+ },
1124
+ {
1125
+ "epoch": 1.56,
1126
+ "learning_rate": 2.6722793888369197e-05,
1127
+ "loss": 0.2017,
1128
+ "step": 1850
1129
+ },
1130
+ {
1131
+ "epoch": 1.57,
1132
+ "learning_rate": 2.6566884939195513e-05,
1133
+ "loss": 0.1893,
1134
+ "step": 1860
1135
+ },
1136
+ {
1137
+ "epoch": 1.57,
1138
+ "learning_rate": 2.641097599002183e-05,
1139
+ "loss": 0.2256,
1140
+ "step": 1870
1141
+ },
1142
+ {
1143
+ "epoch": 1.58,
1144
+ "learning_rate": 2.6255067040848145e-05,
1145
+ "loss": 0.2499,
1146
+ "step": 1880
1147
+ },
1148
+ {
1149
+ "epoch": 1.59,
1150
+ "learning_rate": 2.6099158091674465e-05,
1151
+ "loss": 0.2405,
1152
+ "step": 1890
1153
+ },
1154
+ {
1155
+ "epoch": 1.6,
1156
+ "learning_rate": 2.594324914250078e-05,
1157
+ "loss": 0.2505,
1158
+ "step": 1900
1159
+ },
1160
+ {
1161
+ "epoch": 1.61,
1162
+ "learning_rate": 2.5787340193327097e-05,
1163
+ "loss": 0.2102,
1164
+ "step": 1910
1165
+ },
1166
+ {
1167
+ "epoch": 1.62,
1168
+ "learning_rate": 2.5631431244153413e-05,
1169
+ "loss": 0.2263,
1170
+ "step": 1920
1171
+ },
1172
+ {
1173
+ "epoch": 1.62,
1174
+ "learning_rate": 2.5475522294979733e-05,
1175
+ "loss": 0.2314,
1176
+ "step": 1930
1177
+ },
1178
+ {
1179
+ "epoch": 1.63,
1180
+ "learning_rate": 2.531961334580605e-05,
1181
+ "loss": 0.2166,
1182
+ "step": 1940
1183
+ },
1184
+ {
1185
+ "epoch": 1.64,
1186
+ "learning_rate": 2.516370439663237e-05,
1187
+ "loss": 0.2281,
1188
+ "step": 1950
1189
+ },
1190
+ {
1191
+ "epoch": 1.65,
1192
+ "learning_rate": 2.5007795447458688e-05,
1193
+ "loss": 0.2122,
1194
+ "step": 1960
1195
+ },
1196
+ {
1197
+ "epoch": 1.66,
1198
+ "learning_rate": 2.4851886498285e-05,
1199
+ "loss": 0.2381,
1200
+ "step": 1970
1201
+ },
1202
+ {
1203
+ "epoch": 1.67,
1204
+ "learning_rate": 2.469597754911132e-05,
1205
+ "loss": 0.2209,
1206
+ "step": 1980
1207
+ },
1208
+ {
1209
+ "epoch": 1.67,
1210
+ "learning_rate": 2.454006859993764e-05,
1211
+ "loss": 0.2274,
1212
+ "step": 1990
1213
+ },
1214
+ {
1215
+ "epoch": 1.68,
1216
+ "learning_rate": 2.4384159650763955e-05,
1217
+ "loss": 0.2267,
1218
+ "step": 2000
1219
+ },
1220
+ {
1221
+ "epoch": 1.69,
1222
+ "learning_rate": 2.422825070159027e-05,
1223
+ "loss": 0.2145,
1224
+ "step": 2010
1225
+ },
1226
+ {
1227
+ "epoch": 1.7,
1228
+ "learning_rate": 2.407234175241659e-05,
1229
+ "loss": 0.2255,
1230
+ "step": 2020
1231
+ },
1232
+ {
1233
+ "epoch": 1.71,
1234
+ "learning_rate": 2.3916432803242907e-05,
1235
+ "loss": 0.2436,
1236
+ "step": 2030
1237
+ },
1238
+ {
1239
+ "epoch": 1.72,
1240
+ "learning_rate": 2.3760523854069226e-05,
1241
+ "loss": 0.2258,
1242
+ "step": 2040
1243
+ },
1244
+ {
1245
+ "epoch": 1.73,
1246
+ "learning_rate": 2.3604614904895543e-05,
1247
+ "loss": 0.2201,
1248
+ "step": 2050
1249
+ },
1250
+ {
1251
+ "epoch": 1.73,
1252
+ "learning_rate": 2.344870595572186e-05,
1253
+ "loss": 0.211,
1254
+ "step": 2060
1255
+ },
1256
+ {
1257
+ "epoch": 1.74,
1258
+ "learning_rate": 2.3292797006548178e-05,
1259
+ "loss": 0.2884,
1260
+ "step": 2070
1261
+ },
1262
+ {
1263
+ "epoch": 1.75,
1264
+ "learning_rate": 2.3136888057374494e-05,
1265
+ "loss": 0.2147,
1266
+ "step": 2080
1267
+ },
1268
+ {
1269
+ "epoch": 1.76,
1270
+ "learning_rate": 2.298097910820081e-05,
1271
+ "loss": 0.2356,
1272
+ "step": 2090
1273
+ },
1274
+ {
1275
+ "epoch": 1.77,
1276
+ "learning_rate": 2.282507015902713e-05,
1277
+ "loss": 0.214,
1278
+ "step": 2100
1279
+ },
1280
+ {
1281
+ "epoch": 1.78,
1282
+ "learning_rate": 2.2669161209853446e-05,
1283
+ "loss": 0.221,
1284
+ "step": 2110
1285
+ },
1286
+ {
1287
+ "epoch": 1.78,
1288
+ "learning_rate": 2.2513252260679765e-05,
1289
+ "loss": 0.2366,
1290
+ "step": 2120
1291
+ },
1292
+ {
1293
+ "epoch": 1.79,
1294
+ "learning_rate": 2.235734331150608e-05,
1295
+ "loss": 0.2046,
1296
+ "step": 2130
1297
+ },
1298
+ {
1299
+ "epoch": 1.8,
1300
+ "learning_rate": 2.2201434362332397e-05,
1301
+ "loss": 0.2113,
1302
+ "step": 2140
1303
+ },
1304
+ {
1305
+ "epoch": 1.81,
1306
+ "learning_rate": 2.2045525413158717e-05,
1307
+ "loss": 0.217,
1308
+ "step": 2150
1309
+ },
1310
+ {
1311
+ "epoch": 1.82,
1312
+ "learning_rate": 2.1889616463985033e-05,
1313
+ "loss": 0.1887,
1314
+ "step": 2160
1315
+ },
1316
+ {
1317
+ "epoch": 1.83,
1318
+ "learning_rate": 2.1733707514811352e-05,
1319
+ "loss": 0.1934,
1320
+ "step": 2170
1321
+ },
1322
+ {
1323
+ "epoch": 1.83,
1324
+ "learning_rate": 2.157779856563767e-05,
1325
+ "loss": 0.2295,
1326
+ "step": 2180
1327
+ },
1328
+ {
1329
+ "epoch": 1.84,
1330
+ "learning_rate": 2.1421889616463985e-05,
1331
+ "loss": 0.2161,
1332
+ "step": 2190
1333
+ },
1334
+ {
1335
+ "epoch": 1.85,
1336
+ "learning_rate": 2.1265980667290304e-05,
1337
+ "loss": 0.2304,
1338
+ "step": 2200
1339
+ },
1340
+ {
1341
+ "epoch": 1.86,
1342
+ "learning_rate": 2.111007171811662e-05,
1343
+ "loss": 0.2535,
1344
+ "step": 2210
1345
+ },
1346
+ {
1347
+ "epoch": 1.87,
1348
+ "learning_rate": 2.095416276894294e-05,
1349
+ "loss": 0.2198,
1350
+ "step": 2220
1351
+ },
1352
+ {
1353
+ "epoch": 1.88,
1354
+ "learning_rate": 2.0798253819769256e-05,
1355
+ "loss": 0.2422,
1356
+ "step": 2230
1357
+ },
1358
+ {
1359
+ "epoch": 1.89,
1360
+ "learning_rate": 2.0642344870595572e-05,
1361
+ "loss": 0.2231,
1362
+ "step": 2240
1363
+ },
1364
+ {
1365
+ "epoch": 1.89,
1366
+ "learning_rate": 2.048643592142189e-05,
1367
+ "loss": 0.2184,
1368
+ "step": 2250
1369
+ },
1370
+ {
1371
+ "epoch": 1.9,
1372
+ "learning_rate": 2.0330526972248207e-05,
1373
+ "loss": 0.2294,
1374
+ "step": 2260
1375
+ },
1376
+ {
1377
+ "epoch": 1.91,
1378
+ "learning_rate": 2.0174618023074527e-05,
1379
+ "loss": 0.1991,
1380
+ "step": 2270
1381
+ },
1382
+ {
1383
+ "epoch": 1.92,
1384
+ "learning_rate": 2.0018709073900843e-05,
1385
+ "loss": 0.2187,
1386
+ "step": 2280
1387
+ },
1388
+ {
1389
+ "epoch": 1.93,
1390
+ "learning_rate": 1.986280012472716e-05,
1391
+ "loss": 0.1991,
1392
+ "step": 2290
1393
+ },
1394
+ {
1395
+ "epoch": 1.94,
1396
+ "learning_rate": 1.970689117555348e-05,
1397
+ "loss": 0.1823,
1398
+ "step": 2300
1399
+ },
1400
+ {
1401
+ "epoch": 1.94,
1402
+ "learning_rate": 1.9550982226379798e-05,
1403
+ "loss": 0.237,
1404
+ "step": 2310
1405
+ },
1406
+ {
1407
+ "epoch": 1.95,
1408
+ "learning_rate": 1.9395073277206114e-05,
1409
+ "loss": 0.2189,
1410
+ "step": 2320
1411
+ },
1412
+ {
1413
+ "epoch": 1.96,
1414
+ "learning_rate": 1.923916432803243e-05,
1415
+ "loss": 0.2173,
1416
+ "step": 2330
1417
+ },
1418
+ {
1419
+ "epoch": 1.97,
1420
+ "learning_rate": 1.9083255378858746e-05,
1421
+ "loss": 0.2319,
1422
+ "step": 2340
1423
+ },
1424
+ {
1425
+ "epoch": 1.98,
1426
+ "learning_rate": 1.8927346429685062e-05,
1427
+ "loss": 0.237,
1428
+ "step": 2350
1429
+ },
1430
+ {
1431
+ "epoch": 1.99,
1432
+ "learning_rate": 1.8771437480511385e-05,
1433
+ "loss": 0.2368,
1434
+ "step": 2360
1435
+ },
1436
+ {
1437
+ "epoch": 1.99,
1438
+ "learning_rate": 1.86155285313377e-05,
1439
+ "loss": 0.2099,
1440
+ "step": 2370
1441
+ },
1442
+ {
1443
+ "epoch": 2.0,
1444
+ "eval_accuracy": 0.9570337929810026,
1445
+ "eval_loss": 0.11882077902555466,
1446
+ "eval_runtime": 304.3918,
1447
+ "eval_samples_per_second": 55.511,
1448
+ "eval_steps_per_second": 1.738,
1449
+ "step": 2376
1450
+ },
1451
+ {
1452
+ "epoch": 2.0,
1453
+ "learning_rate": 1.8459619582164017e-05,
1454
+ "loss": 0.2455,
1455
+ "step": 2380
1456
+ },
1457
+ {
1458
+ "epoch": 2.01,
1459
+ "learning_rate": 1.8303710632990333e-05,
1460
+ "loss": 0.2108,
1461
+ "step": 2390
1462
+ },
1463
+ {
1464
+ "epoch": 2.02,
1465
+ "learning_rate": 1.814780168381665e-05,
1466
+ "loss": 0.1959,
1467
+ "step": 2400
1468
+ },
1469
+ {
1470
+ "epoch": 2.03,
1471
+ "learning_rate": 1.799189273464297e-05,
1472
+ "loss": 0.2015,
1473
+ "step": 2410
1474
+ },
1475
+ {
1476
+ "epoch": 2.04,
1477
+ "learning_rate": 1.783598378546929e-05,
1478
+ "loss": 0.1989,
1479
+ "step": 2420
1480
+ },
1481
+ {
1482
+ "epoch": 2.05,
1483
+ "learning_rate": 1.7680074836295604e-05,
1484
+ "loss": 0.1953,
1485
+ "step": 2430
1486
+ },
1487
+ {
1488
+ "epoch": 2.05,
1489
+ "learning_rate": 1.752416588712192e-05,
1490
+ "loss": 0.2103,
1491
+ "step": 2440
1492
+ },
1493
+ {
1494
+ "epoch": 2.06,
1495
+ "learning_rate": 1.7368256937948237e-05,
1496
+ "loss": 0.2186,
1497
+ "step": 2450
1498
+ },
1499
+ {
1500
+ "epoch": 2.07,
1501
+ "learning_rate": 1.7212347988774556e-05,
1502
+ "loss": 0.2499,
1503
+ "step": 2460
1504
+ },
1505
+ {
1506
+ "epoch": 2.08,
1507
+ "learning_rate": 1.7056439039600876e-05,
1508
+ "loss": 0.234,
1509
+ "step": 2470
1510
+ },
1511
+ {
1512
+ "epoch": 2.09,
1513
+ "learning_rate": 1.690053009042719e-05,
1514
+ "loss": 0.2212,
1515
+ "step": 2480
1516
+ },
1517
+ {
1518
+ "epoch": 2.1,
1519
+ "learning_rate": 1.6744621141253508e-05,
1520
+ "loss": 0.2011,
1521
+ "step": 2490
1522
+ },
1523
+ {
1524
+ "epoch": 2.1,
1525
+ "learning_rate": 1.6588712192079824e-05,
1526
+ "loss": 0.2273,
1527
+ "step": 2500
1528
+ },
1529
+ {
1530
+ "epoch": 2.11,
1531
+ "learning_rate": 1.6432803242906143e-05,
1532
+ "loss": 0.2124,
1533
+ "step": 2510
1534
+ },
1535
+ {
1536
+ "epoch": 2.12,
1537
+ "learning_rate": 1.6276894293732463e-05,
1538
+ "loss": 0.1946,
1539
+ "step": 2520
1540
+ },
1541
+ {
1542
+ "epoch": 2.13,
1543
+ "learning_rate": 1.612098534455878e-05,
1544
+ "loss": 0.2414,
1545
+ "step": 2530
1546
+ },
1547
+ {
1548
+ "epoch": 2.14,
1549
+ "learning_rate": 1.5965076395385095e-05,
1550
+ "loss": 0.2319,
1551
+ "step": 2540
1552
+ },
1553
+ {
1554
+ "epoch": 2.15,
1555
+ "learning_rate": 1.580916744621141e-05,
1556
+ "loss": 0.1864,
1557
+ "step": 2550
1558
+ },
1559
+ {
1560
+ "epoch": 2.15,
1561
+ "learning_rate": 1.565325849703773e-05,
1562
+ "loss": 0.2163,
1563
+ "step": 2560
1564
+ },
1565
+ {
1566
+ "epoch": 2.16,
1567
+ "learning_rate": 1.549734954786405e-05,
1568
+ "loss": 0.223,
1569
+ "step": 2570
1570
+ },
1571
+ {
1572
+ "epoch": 2.17,
1573
+ "learning_rate": 1.5341440598690366e-05,
1574
+ "loss": 0.202,
1575
+ "step": 2580
1576
+ },
1577
+ {
1578
+ "epoch": 2.18,
1579
+ "learning_rate": 1.5185531649516682e-05,
1580
+ "loss": 0.2173,
1581
+ "step": 2590
1582
+ },
1583
+ {
1584
+ "epoch": 2.19,
1585
+ "learning_rate": 1.5029622700343002e-05,
1586
+ "loss": 0.2205,
1587
+ "step": 2600
1588
+ },
1589
+ {
1590
+ "epoch": 2.2,
1591
+ "learning_rate": 1.487371375116932e-05,
1592
+ "loss": 0.2409,
1593
+ "step": 2610
1594
+ },
1595
+ {
1596
+ "epoch": 2.2,
1597
+ "learning_rate": 1.4717804801995635e-05,
1598
+ "loss": 0.2188,
1599
+ "step": 2620
1600
+ },
1601
+ {
1602
+ "epoch": 2.21,
1603
+ "learning_rate": 1.4561895852821952e-05,
1604
+ "loss": 0.2067,
1605
+ "step": 2630
1606
+ },
1607
+ {
1608
+ "epoch": 2.22,
1609
+ "learning_rate": 1.440598690364827e-05,
1610
+ "loss": 0.2117,
1611
+ "step": 2640
1612
+ },
1613
+ {
1614
+ "epoch": 2.23,
1615
+ "learning_rate": 1.4250077954474589e-05,
1616
+ "loss": 0.2101,
1617
+ "step": 2650
1618
+ },
1619
+ {
1620
+ "epoch": 2.24,
1621
+ "learning_rate": 1.4094169005300905e-05,
1622
+ "loss": 0.222,
1623
+ "step": 2660
1624
+ },
1625
+ {
1626
+ "epoch": 2.25,
1627
+ "learning_rate": 1.3938260056127223e-05,
1628
+ "loss": 0.2403,
1629
+ "step": 2670
1630
+ },
1631
+ {
1632
+ "epoch": 2.26,
1633
+ "learning_rate": 1.3782351106953539e-05,
1634
+ "loss": 0.2037,
1635
+ "step": 2680
1636
+ },
1637
+ {
1638
+ "epoch": 2.26,
1639
+ "learning_rate": 1.3626442157779856e-05,
1640
+ "loss": 0.2064,
1641
+ "step": 2690
1642
+ },
1643
+ {
1644
+ "epoch": 2.27,
1645
+ "learning_rate": 1.3470533208606176e-05,
1646
+ "loss": 0.2301,
1647
+ "step": 2700
1648
+ },
1649
+ {
1650
+ "epoch": 2.28,
1651
+ "learning_rate": 1.3314624259432492e-05,
1652
+ "loss": 0.1992,
1653
+ "step": 2710
1654
+ },
1655
+ {
1656
+ "epoch": 2.29,
1657
+ "learning_rate": 1.315871531025881e-05,
1658
+ "loss": 0.1956,
1659
+ "step": 2720
1660
+ },
1661
+ {
1662
+ "epoch": 2.3,
1663
+ "learning_rate": 1.3002806361085126e-05,
1664
+ "loss": 0.1989,
1665
+ "step": 2730
1666
+ },
1667
+ {
1668
+ "epoch": 2.31,
1669
+ "learning_rate": 1.2846897411911444e-05,
1670
+ "loss": 0.2183,
1671
+ "step": 2740
1672
+ },
1673
+ {
1674
+ "epoch": 2.31,
1675
+ "learning_rate": 1.2690988462737763e-05,
1676
+ "loss": 0.2311,
1677
+ "step": 2750
1678
+ },
1679
+ {
1680
+ "epoch": 2.32,
1681
+ "learning_rate": 1.253507951356408e-05,
1682
+ "loss": 0.2116,
1683
+ "step": 2760
1684
+ },
1685
+ {
1686
+ "epoch": 2.33,
1687
+ "learning_rate": 1.2379170564390397e-05,
1688
+ "loss": 0.1907,
1689
+ "step": 2770
1690
+ },
1691
+ {
1692
+ "epoch": 2.34,
1693
+ "learning_rate": 1.2223261615216713e-05,
1694
+ "loss": 0.2253,
1695
+ "step": 2780
1696
+ },
1697
+ {
1698
+ "epoch": 2.35,
1699
+ "learning_rate": 1.2067352666043031e-05,
1700
+ "loss": 0.2089,
1701
+ "step": 2790
1702
+ },
1703
+ {
1704
+ "epoch": 2.36,
1705
+ "learning_rate": 1.1911443716869349e-05,
1706
+ "loss": 0.2271,
1707
+ "step": 2800
1708
+ },
1709
+ {
1710
+ "epoch": 2.36,
1711
+ "learning_rate": 1.1755534767695666e-05,
1712
+ "loss": 0.2093,
1713
+ "step": 2810
1714
+ },
1715
+ {
1716
+ "epoch": 2.37,
1717
+ "learning_rate": 1.1599625818521984e-05,
1718
+ "loss": 0.2115,
1719
+ "step": 2820
1720
+ },
1721
+ {
1722
+ "epoch": 2.38,
1723
+ "learning_rate": 1.14437168693483e-05,
1724
+ "loss": 0.2228,
1725
+ "step": 2830
1726
+ },
1727
+ {
1728
+ "epoch": 2.39,
1729
+ "learning_rate": 1.1287807920174618e-05,
1730
+ "loss": 0.2002,
1731
+ "step": 2840
1732
+ },
1733
+ {
1734
+ "epoch": 2.4,
1735
+ "learning_rate": 1.1131898971000936e-05,
1736
+ "loss": 0.212,
1737
+ "step": 2850
1738
+ },
1739
+ {
1740
+ "epoch": 2.41,
1741
+ "learning_rate": 1.0975990021827254e-05,
1742
+ "loss": 0.2017,
1743
+ "step": 2860
1744
+ },
1745
+ {
1746
+ "epoch": 2.42,
1747
+ "learning_rate": 1.0820081072653571e-05,
1748
+ "loss": 0.2283,
1749
+ "step": 2870
1750
+ },
1751
+ {
1752
+ "epoch": 2.42,
1753
+ "learning_rate": 1.066417212347989e-05,
1754
+ "loss": 0.215,
1755
+ "step": 2880
1756
+ },
1757
+ {
1758
+ "epoch": 2.43,
1759
+ "learning_rate": 1.0508263174306205e-05,
1760
+ "loss": 0.2146,
1761
+ "step": 2890
1762
+ },
1763
+ {
1764
+ "epoch": 2.44,
1765
+ "learning_rate": 1.0352354225132523e-05,
1766
+ "loss": 0.2017,
1767
+ "step": 2900
1768
+ },
1769
+ {
1770
+ "epoch": 2.45,
1771
+ "learning_rate": 1.019644527595884e-05,
1772
+ "loss": 0.2078,
1773
+ "step": 2910
1774
+ },
1775
+ {
1776
+ "epoch": 2.46,
1777
+ "learning_rate": 1.0040536326785157e-05,
1778
+ "loss": 0.2092,
1779
+ "step": 2920
1780
+ },
1781
+ {
1782
+ "epoch": 2.47,
1783
+ "learning_rate": 9.884627377611476e-06,
1784
+ "loss": 0.1804,
1785
+ "step": 2930
1786
+ },
1787
+ {
1788
+ "epoch": 2.47,
1789
+ "learning_rate": 9.728718428437792e-06,
1790
+ "loss": 0.1877,
1791
+ "step": 2940
1792
+ },
1793
+ {
1794
+ "epoch": 2.48,
1795
+ "learning_rate": 9.57280947926411e-06,
1796
+ "loss": 0.216,
1797
+ "step": 2950
1798
+ },
1799
+ {
1800
+ "epoch": 2.49,
1801
+ "learning_rate": 9.416900530090428e-06,
1802
+ "loss": 0.2339,
1803
+ "step": 2960
1804
+ },
1805
+ {
1806
+ "epoch": 2.5,
1807
+ "learning_rate": 9.260991580916744e-06,
1808
+ "loss": 0.2199,
1809
+ "step": 2970
1810
+ },
1811
+ {
1812
+ "epoch": 2.51,
1813
+ "learning_rate": 9.105082631743064e-06,
1814
+ "loss": 0.2094,
1815
+ "step": 2980
1816
+ },
1817
+ {
1818
+ "epoch": 2.52,
1819
+ "learning_rate": 8.94917368256938e-06,
1820
+ "loss": 0.2139,
1821
+ "step": 2990
1822
+ },
1823
+ {
1824
+ "epoch": 2.52,
1825
+ "learning_rate": 8.793264733395697e-06,
1826
+ "loss": 0.2141,
1827
+ "step": 3000
1828
+ },
1829
+ {
1830
+ "epoch": 2.53,
1831
+ "learning_rate": 8.637355784222015e-06,
1832
+ "loss": 0.1935,
1833
+ "step": 3010
1834
+ },
1835
+ {
1836
+ "epoch": 2.54,
1837
+ "learning_rate": 8.481446835048331e-06,
1838
+ "loss": 0.1691,
1839
+ "step": 3020
1840
+ },
1841
+ {
1842
+ "epoch": 2.55,
1843
+ "learning_rate": 8.32553788587465e-06,
1844
+ "loss": 0.2109,
1845
+ "step": 3030
1846
+ },
1847
+ {
1848
+ "epoch": 2.56,
1849
+ "learning_rate": 8.169628936700967e-06,
1850
+ "loss": 0.2142,
1851
+ "step": 3040
1852
+ },
1853
+ {
1854
+ "epoch": 2.57,
1855
+ "learning_rate": 8.013719987527285e-06,
1856
+ "loss": 0.221,
1857
+ "step": 3050
1858
+ },
1859
+ {
1860
+ "epoch": 2.58,
1861
+ "learning_rate": 7.857811038353602e-06,
1862
+ "loss": 0.1718,
1863
+ "step": 3060
1864
+ },
1865
+ {
1866
+ "epoch": 2.58,
1867
+ "learning_rate": 7.701902089179918e-06,
1868
+ "loss": 0.2,
1869
+ "step": 3070
1870
+ },
1871
+ {
1872
+ "epoch": 2.59,
1873
+ "learning_rate": 7.545993140006237e-06,
1874
+ "loss": 0.1856,
1875
+ "step": 3080
1876
+ },
1877
+ {
1878
+ "epoch": 2.6,
1879
+ "learning_rate": 7.390084190832554e-06,
1880
+ "loss": 0.1998,
1881
+ "step": 3090
1882
+ },
1883
+ {
1884
+ "epoch": 2.61,
1885
+ "learning_rate": 7.234175241658872e-06,
1886
+ "loss": 0.1969,
1887
+ "step": 3100
1888
+ },
1889
+ {
1890
+ "epoch": 2.62,
1891
+ "learning_rate": 7.078266292485189e-06,
1892
+ "loss": 0.1886,
1893
+ "step": 3110
1894
+ },
1895
+ {
1896
+ "epoch": 2.63,
1897
+ "learning_rate": 6.922357343311506e-06,
1898
+ "loss": 0.1732,
1899
+ "step": 3120
1900
+ },
1901
+ {
1902
+ "epoch": 2.63,
1903
+ "learning_rate": 6.766448394137824e-06,
1904
+ "loss": 0.2157,
1905
+ "step": 3130
1906
+ },
1907
+ {
1908
+ "epoch": 2.64,
1909
+ "learning_rate": 6.610539444964141e-06,
1910
+ "loss": 0.214,
1911
+ "step": 3140
1912
+ },
1913
+ {
1914
+ "epoch": 2.65,
1915
+ "learning_rate": 6.454630495790459e-06,
1916
+ "loss": 0.1919,
1917
+ "step": 3150
1918
+ },
1919
+ {
1920
+ "epoch": 2.66,
1921
+ "learning_rate": 6.298721546616776e-06,
1922
+ "loss": 0.2152,
1923
+ "step": 3160
1924
+ },
1925
+ {
1926
+ "epoch": 2.67,
1927
+ "learning_rate": 6.142812597443094e-06,
1928
+ "loss": 0.1989,
1929
+ "step": 3170
1930
+ },
1931
+ {
1932
+ "epoch": 2.68,
1933
+ "learning_rate": 5.9869036482694114e-06,
1934
+ "loss": 0.1901,
1935
+ "step": 3180
1936
+ },
1937
+ {
1938
+ "epoch": 2.68,
1939
+ "learning_rate": 5.830994699095728e-06,
1940
+ "loss": 0.2068,
1941
+ "step": 3190
1942
+ },
1943
+ {
1944
+ "epoch": 2.69,
1945
+ "learning_rate": 5.675085749922046e-06,
1946
+ "loss": 0.1901,
1947
+ "step": 3200
1948
+ },
1949
+ {
1950
+ "epoch": 2.7,
1951
+ "learning_rate": 5.519176800748363e-06,
1952
+ "loss": 0.2091,
1953
+ "step": 3210
1954
+ },
1955
+ {
1956
+ "epoch": 2.71,
1957
+ "learning_rate": 5.363267851574681e-06,
1958
+ "loss": 0.206,
1959
+ "step": 3220
1960
+ },
1961
+ {
1962
+ "epoch": 2.72,
1963
+ "learning_rate": 5.207358902400998e-06,
1964
+ "loss": 0.2135,
1965
+ "step": 3230
1966
+ },
1967
+ {
1968
+ "epoch": 2.73,
1969
+ "learning_rate": 5.0514499532273156e-06,
1970
+ "loss": 0.2252,
1971
+ "step": 3240
1972
+ },
1973
+ {
1974
+ "epoch": 2.74,
1975
+ "learning_rate": 4.895541004053633e-06,
1976
+ "loss": 0.1835,
1977
+ "step": 3250
1978
+ },
1979
+ {
1980
+ "epoch": 2.74,
1981
+ "learning_rate": 4.73963205487995e-06,
1982
+ "loss": 0.1613,
1983
+ "step": 3260
1984
+ },
1985
+ {
1986
+ "epoch": 2.75,
1987
+ "learning_rate": 4.583723105706267e-06,
1988
+ "loss": 0.2149,
1989
+ "step": 3270
1990
+ },
1991
+ {
1992
+ "epoch": 2.76,
1993
+ "learning_rate": 4.427814156532585e-06,
1994
+ "loss": 0.1654,
1995
+ "step": 3280
1996
+ },
1997
+ {
1998
+ "epoch": 2.77,
1999
+ "learning_rate": 4.271905207358903e-06,
2000
+ "loss": 0.2135,
2001
+ "step": 3290
2002
+ },
2003
+ {
2004
+ "epoch": 2.78,
2005
+ "learning_rate": 4.1159962581852205e-06,
2006
+ "loss": 0.1936,
2007
+ "step": 3300
2008
+ },
2009
+ {
2010
+ "epoch": 2.79,
2011
+ "learning_rate": 3.9600873090115375e-06,
2012
+ "loss": 0.2129,
2013
+ "step": 3310
2014
+ },
2015
+ {
2016
+ "epoch": 2.79,
2017
+ "learning_rate": 3.8041783598378544e-06,
2018
+ "loss": 0.2319,
2019
+ "step": 3320
2020
+ },
2021
+ {
2022
+ "epoch": 2.8,
2023
+ "learning_rate": 3.648269410664172e-06,
2024
+ "loss": 0.237,
2025
+ "step": 3330
2026
+ },
2027
+ {
2028
+ "epoch": 2.81,
2029
+ "learning_rate": 3.4923604614904895e-06,
2030
+ "loss": 0.2078,
2031
+ "step": 3340
2032
+ },
2033
+ {
2034
+ "epoch": 2.82,
2035
+ "learning_rate": 3.3364515123168073e-06,
2036
+ "loss": 0.1914,
2037
+ "step": 3350
2038
+ },
2039
+ {
2040
+ "epoch": 2.83,
2041
+ "learning_rate": 3.1805425631431246e-06,
2042
+ "loss": 0.1764,
2043
+ "step": 3360
2044
+ },
2045
+ {
2046
+ "epoch": 2.84,
2047
+ "learning_rate": 3.024633613969442e-06,
2048
+ "loss": 0.2198,
2049
+ "step": 3370
2050
+ },
2051
+ {
2052
+ "epoch": 2.84,
2053
+ "learning_rate": 2.8687246647957593e-06,
2054
+ "loss": 0.2305,
2055
+ "step": 3380
2056
+ },
2057
+ {
2058
+ "epoch": 2.85,
2059
+ "learning_rate": 2.7128157156220767e-06,
2060
+ "loss": 0.1813,
2061
+ "step": 3390
2062
+ },
2063
+ {
2064
+ "epoch": 2.86,
2065
+ "learning_rate": 2.556906766448394e-06,
2066
+ "loss": 0.174,
2067
+ "step": 3400
2068
+ },
2069
+ {
2070
+ "epoch": 2.87,
2071
+ "learning_rate": 2.400997817274712e-06,
2072
+ "loss": 0.1667,
2073
+ "step": 3410
2074
+ },
2075
+ {
2076
+ "epoch": 2.88,
2077
+ "learning_rate": 2.2450888681010288e-06,
2078
+ "loss": 0.2069,
2079
+ "step": 3420
2080
+ },
2081
+ {
2082
+ "epoch": 2.89,
2083
+ "learning_rate": 2.0891799189273465e-06,
2084
+ "loss": 0.2115,
2085
+ "step": 3430
2086
+ },
2087
+ {
2088
+ "epoch": 2.9,
2089
+ "learning_rate": 1.933270969753664e-06,
2090
+ "loss": 0.2187,
2091
+ "step": 3440
2092
+ },
2093
+ {
2094
+ "epoch": 2.9,
2095
+ "learning_rate": 1.7773620205799812e-06,
2096
+ "loss": 0.2239,
2097
+ "step": 3450
2098
+ },
2099
+ {
2100
+ "epoch": 2.91,
2101
+ "learning_rate": 1.6214530714062988e-06,
2102
+ "loss": 0.2191,
2103
+ "step": 3460
2104
+ },
2105
+ {
2106
+ "epoch": 2.92,
2107
+ "learning_rate": 1.4655441222326162e-06,
2108
+ "loss": 0.194,
2109
+ "step": 3470
2110
+ },
2111
+ {
2112
+ "epoch": 2.93,
2113
+ "learning_rate": 1.3096351730589337e-06,
2114
+ "loss": 0.1813,
2115
+ "step": 3480
2116
+ },
2117
+ {
2118
+ "epoch": 2.94,
2119
+ "learning_rate": 1.153726223885251e-06,
2120
+ "loss": 0.2153,
2121
+ "step": 3490
2122
+ },
2123
+ {
2124
+ "epoch": 2.95,
2125
+ "learning_rate": 9.978172747115684e-07,
2126
+ "loss": 0.1929,
2127
+ "step": 3500
2128
+ },
2129
+ {
2130
+ "epoch": 2.95,
2131
+ "learning_rate": 8.41908325537886e-07,
2132
+ "loss": 0.2387,
2133
+ "step": 3510
2134
+ },
2135
+ {
2136
+ "epoch": 2.96,
2137
+ "learning_rate": 6.859993763642034e-07,
2138
+ "loss": 0.1916,
2139
+ "step": 3520
2140
+ },
2141
+ {
2142
+ "epoch": 2.97,
2143
+ "learning_rate": 5.300904271905207e-07,
2144
+ "loss": 0.2146,
2145
+ "step": 3530
2146
+ },
2147
+ {
2148
+ "epoch": 2.98,
2149
+ "learning_rate": 3.741814780168382e-07,
2150
+ "loss": 0.1924,
2151
+ "step": 3540
2152
+ },
2153
+ {
2154
+ "epoch": 2.99,
2155
+ "learning_rate": 2.182725288431556e-07,
2156
+ "loss": 0.1937,
2157
+ "step": 3550
2158
+ },
2159
+ {
2160
+ "epoch": 3.0,
2161
+ "learning_rate": 6.236357966947303e-08,
2162
+ "loss": 0.1668,
2163
+ "step": 3560
2164
+ },
2165
+ {
2166
+ "epoch": 3.0,
2167
+ "eval_accuracy": 0.9615316328342309,
2168
+ "eval_loss": 0.1066935583949089,
2169
+ "eval_runtime": 302.9553,
2170
+ "eval_samples_per_second": 55.774,
2171
+ "eval_steps_per_second": 1.746,
2172
+ "step": 3564
2173
+ },
2174
+ {
2175
+ "epoch": 3.0,
2176
+ "step": 3564,
2177
+ "total_flos": 1.484110125228884e+19,
2178
+ "train_loss": 0.2671308586192051,
2179
+ "train_runtime": 11690.3858,
2180
+ "train_samples_per_second": 39.025,
2181
+ "train_steps_per_second": 0.305
2182
+ }
2183
+ ],
2184
+ "logging_steps": 10,
2185
+ "max_steps": 3564,
2186
+ "num_train_epochs": 3,
2187
+ "save_steps": 500,
2188
+ "total_flos": 1.484110125228884e+19,
2189
+ "trial_name": null,
2190
+ "trial_params": null
2191
+ }