itsLeen commited on
Commit
c2b9437
·
verified ·
1 Parent(s): 9389aab

🍻 cheers

Browse files
README.md CHANGED
@@ -2,6 +2,7 @@
2
  license: apache-2.0
3
  base_model: dima806/deepfake_vs_real_image_detection
4
  tags:
 
5
  - generated_from_trainer
6
  datasets:
7
  - imagefolder
@@ -14,7 +15,7 @@ model-index:
14
  name: Image Classification
15
  type: image-classification
16
  dataset:
17
- name: imagefolder
18
  type: imagefolder
19
  config: default
20
  split: train
@@ -22,7 +23,7 @@ model-index:
22
  metrics:
23
  - name: Accuracy
24
  type: accuracy
25
- value: 0.8654545454545455
26
  ---
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -30,10 +31,10 @@ should probably proofread and complete it, then remove this comment. -->
30
 
31
  # realFake-img
32
 
33
- This model is a fine-tuned version of [dima806/deepfake_vs_real_image_detection](https://huggingface.co/dima806/deepfake_vs_real_image_detection) on the imagefolder dataset.
34
  It achieves the following results on the evaluation set:
35
- - Loss: 0.4350
36
- - Accuracy: 0.8655
37
 
38
  ## Model description
39
 
 
2
  license: apache-2.0
3
  base_model: dima806/deepfake_vs_real_image_detection
4
  tags:
5
+ - image-classification
6
  - generated_from_trainer
7
  datasets:
8
  - imagefolder
 
15
  name: Image Classification
16
  type: image-classification
17
  dataset:
18
+ name: ai_real_images
19
  type: imagefolder
20
  config: default
21
  split: train
 
23
  metrics:
24
  - name: Accuracy
25
  type: accuracy
26
+ value: 0.8518181818181818
27
  ---
28
 
29
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
31
 
32
  # realFake-img
33
 
34
+ This model is a fine-tuned version of [dima806/deepfake_vs_real_image_detection](https://huggingface.co/dima806/deepfake_vs_real_image_detection) on the ai_real_images dataset.
35
  It achieves the following results on the evaluation set:
36
+ - Loss: 0.3329
37
+ - Accuracy: 0.8518
38
 
39
  ## Model description
40
 
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.0,
3
+ "eval_accuracy": 0.8518181818181818,
4
+ "eval_loss": 0.33288896083831787,
5
+ "eval_runtime": 24.2544,
6
+ "eval_samples_per_second": 45.353,
7
+ "eval_steps_per_second": 5.69,
8
+ "total_flos": 1.9301704773202575e+18,
9
+ "train_loss": 0.266804637053074,
10
+ "train_runtime": 1157.865,
11
+ "train_samples_per_second": 21.512,
12
+ "train_steps_per_second": 1.347
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.0,
3
+ "eval_accuracy": 0.8518181818181818,
4
+ "eval_loss": 0.33288896083831787,
5
+ "eval_runtime": 24.2544,
6
+ "eval_samples_per_second": 45.353,
7
+ "eval_steps_per_second": 5.69
8
+ }
runs/Sep04_16-25-43_eec317420151/events.out.tfevents.1725468448.eec317420151.967.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76e1315ff7a25a92d56b47cb210475c69e8adb0848ba80e1e2402fe09dffa799
3
+ size 411
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.0,
3
+ "total_flos": 1.9301704773202575e+18,
4
+ "train_loss": 0.266804637053074,
5
+ "train_runtime": 1157.865,
6
+ "train_samples_per_second": 21.512,
7
+ "train_steps_per_second": 1.347
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1269 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.33288896083831787,
3
+ "best_model_checkpoint": "realFake-img/checkpoint-700",
4
+ "epoch": 4.0,
5
+ "eval_steps": 100,
6
+ "global_step": 1560,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.02564102564102564,
13
+ "grad_norm": 2.928205728530884,
14
+ "learning_rate": 0.00019871794871794874,
15
+ "loss": 1.1209,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.05128205128205128,
20
+ "grad_norm": 3.1282591819763184,
21
+ "learning_rate": 0.00019743589743589744,
22
+ "loss": 0.6435,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.07692307692307693,
27
+ "grad_norm": 2.2042582035064697,
28
+ "learning_rate": 0.00019615384615384615,
29
+ "loss": 0.6513,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.10256410256410256,
34
+ "grad_norm": 4.91236686706543,
35
+ "learning_rate": 0.00019487179487179487,
36
+ "loss": 0.7536,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.1282051282051282,
41
+ "grad_norm": 2.019882917404175,
42
+ "learning_rate": 0.0001935897435897436,
43
+ "loss": 0.6197,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.15384615384615385,
48
+ "grad_norm": 2.4161789417266846,
49
+ "learning_rate": 0.00019230769230769233,
50
+ "loss": 0.5531,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.1794871794871795,
55
+ "grad_norm": 2.189767360687256,
56
+ "learning_rate": 0.00019102564102564104,
57
+ "loss": 0.4985,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.20512820512820512,
62
+ "grad_norm": 3.2690813541412354,
63
+ "learning_rate": 0.00018974358974358974,
64
+ "loss": 0.5499,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.23076923076923078,
69
+ "grad_norm": 1.6541521549224854,
70
+ "learning_rate": 0.00018846153846153847,
71
+ "loss": 0.5365,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.2564102564102564,
76
+ "grad_norm": 1.9506237506866455,
77
+ "learning_rate": 0.0001871794871794872,
78
+ "loss": 0.4892,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.2564102564102564,
83
+ "eval_accuracy": 0.7227272727272728,
84
+ "eval_loss": 0.5756350159645081,
85
+ "eval_runtime": 24.0859,
86
+ "eval_samples_per_second": 45.67,
87
+ "eval_steps_per_second": 5.73,
88
+ "step": 100
89
+ },
90
+ {
91
+ "epoch": 0.28205128205128205,
92
+ "grad_norm": 3.7501442432403564,
93
+ "learning_rate": 0.0001858974358974359,
94
+ "loss": 0.4306,
95
+ "step": 110
96
+ },
97
+ {
98
+ "epoch": 0.3076923076923077,
99
+ "grad_norm": 2.692314386367798,
100
+ "learning_rate": 0.00018461538461538463,
101
+ "loss": 0.5122,
102
+ "step": 120
103
+ },
104
+ {
105
+ "epoch": 0.3333333333333333,
106
+ "grad_norm": 2.5989458560943604,
107
+ "learning_rate": 0.00018333333333333334,
108
+ "loss": 0.4974,
109
+ "step": 130
110
+ },
111
+ {
112
+ "epoch": 0.358974358974359,
113
+ "grad_norm": 1.8959237337112427,
114
+ "learning_rate": 0.00018205128205128207,
115
+ "loss": 0.4464,
116
+ "step": 140
117
+ },
118
+ {
119
+ "epoch": 0.38461538461538464,
120
+ "grad_norm": 1.9950543642044067,
121
+ "learning_rate": 0.00018076923076923077,
122
+ "loss": 0.4366,
123
+ "step": 150
124
+ },
125
+ {
126
+ "epoch": 0.41025641025641024,
127
+ "grad_norm": 1.4334654808044434,
128
+ "learning_rate": 0.0001794871794871795,
129
+ "loss": 0.5248,
130
+ "step": 160
131
+ },
132
+ {
133
+ "epoch": 0.4358974358974359,
134
+ "grad_norm": 1.7422298192977905,
135
+ "learning_rate": 0.00017820512820512823,
136
+ "loss": 0.4314,
137
+ "step": 170
138
+ },
139
+ {
140
+ "epoch": 0.46153846153846156,
141
+ "grad_norm": 1.0945038795471191,
142
+ "learning_rate": 0.00017692307692307693,
143
+ "loss": 0.3464,
144
+ "step": 180
145
+ },
146
+ {
147
+ "epoch": 0.48717948717948717,
148
+ "grad_norm": 2.3801605701446533,
149
+ "learning_rate": 0.00017564102564102566,
150
+ "loss": 0.4059,
151
+ "step": 190
152
+ },
153
+ {
154
+ "epoch": 0.5128205128205128,
155
+ "grad_norm": 1.6624411344528198,
156
+ "learning_rate": 0.00017435897435897436,
157
+ "loss": 0.683,
158
+ "step": 200
159
+ },
160
+ {
161
+ "epoch": 0.5128205128205128,
162
+ "eval_accuracy": 0.6372727272727273,
163
+ "eval_loss": 0.6742109656333923,
164
+ "eval_runtime": 24.1183,
165
+ "eval_samples_per_second": 45.609,
166
+ "eval_steps_per_second": 5.722,
167
+ "step": 200
168
+ },
169
+ {
170
+ "epoch": 0.5384615384615384,
171
+ "grad_norm": 3.1722240447998047,
172
+ "learning_rate": 0.0001730769230769231,
173
+ "loss": 0.4548,
174
+ "step": 210
175
+ },
176
+ {
177
+ "epoch": 0.5641025641025641,
178
+ "grad_norm": 2.0260214805603027,
179
+ "learning_rate": 0.0001717948717948718,
180
+ "loss": 0.4612,
181
+ "step": 220
182
+ },
183
+ {
184
+ "epoch": 0.5897435897435898,
185
+ "grad_norm": 2.1661155223846436,
186
+ "learning_rate": 0.00017051282051282053,
187
+ "loss": 0.3824,
188
+ "step": 230
189
+ },
190
+ {
191
+ "epoch": 0.6153846153846154,
192
+ "grad_norm": 2.2094335556030273,
193
+ "learning_rate": 0.00016923076923076923,
194
+ "loss": 0.3815,
195
+ "step": 240
196
+ },
197
+ {
198
+ "epoch": 0.6410256410256411,
199
+ "grad_norm": 2.571754217147827,
200
+ "learning_rate": 0.00016794871794871796,
201
+ "loss": 0.3743,
202
+ "step": 250
203
+ },
204
+ {
205
+ "epoch": 0.6666666666666666,
206
+ "grad_norm": 1.545766830444336,
207
+ "learning_rate": 0.0001666666666666667,
208
+ "loss": 0.3708,
209
+ "step": 260
210
+ },
211
+ {
212
+ "epoch": 0.6923076923076923,
213
+ "grad_norm": 2.742072343826294,
214
+ "learning_rate": 0.0001653846153846154,
215
+ "loss": 0.3729,
216
+ "step": 270
217
+ },
218
+ {
219
+ "epoch": 0.717948717948718,
220
+ "grad_norm": 2.677527904510498,
221
+ "learning_rate": 0.0001641025641025641,
222
+ "loss": 0.3555,
223
+ "step": 280
224
+ },
225
+ {
226
+ "epoch": 0.7435897435897436,
227
+ "grad_norm": 1.6417704820632935,
228
+ "learning_rate": 0.00016282051282051282,
229
+ "loss": 0.4212,
230
+ "step": 290
231
+ },
232
+ {
233
+ "epoch": 0.7692307692307693,
234
+ "grad_norm": 2.6961071491241455,
235
+ "learning_rate": 0.00016153846153846155,
236
+ "loss": 0.3737,
237
+ "step": 300
238
+ },
239
+ {
240
+ "epoch": 0.7692307692307693,
241
+ "eval_accuracy": 0.7554545454545455,
242
+ "eval_loss": 0.5462190508842468,
243
+ "eval_runtime": 24.1467,
244
+ "eval_samples_per_second": 45.555,
245
+ "eval_steps_per_second": 5.715,
246
+ "step": 300
247
+ },
248
+ {
249
+ "epoch": 0.7948717948717948,
250
+ "grad_norm": 3.5049915313720703,
251
+ "learning_rate": 0.00016025641025641028,
252
+ "loss": 0.3995,
253
+ "step": 310
254
+ },
255
+ {
256
+ "epoch": 0.8205128205128205,
257
+ "grad_norm": 2.599503517150879,
258
+ "learning_rate": 0.00015897435897435896,
259
+ "loss": 0.3805,
260
+ "step": 320
261
+ },
262
+ {
263
+ "epoch": 0.8461538461538461,
264
+ "grad_norm": 2.3924107551574707,
265
+ "learning_rate": 0.0001576923076923077,
266
+ "loss": 0.3586,
267
+ "step": 330
268
+ },
269
+ {
270
+ "epoch": 0.8717948717948718,
271
+ "grad_norm": 3.0169456005096436,
272
+ "learning_rate": 0.00015641025641025642,
273
+ "loss": 0.3964,
274
+ "step": 340
275
+ },
276
+ {
277
+ "epoch": 0.8974358974358975,
278
+ "grad_norm": 3.907792091369629,
279
+ "learning_rate": 0.00015512820512820515,
280
+ "loss": 0.3915,
281
+ "step": 350
282
+ },
283
+ {
284
+ "epoch": 0.9230769230769231,
285
+ "grad_norm": 1.2597954273223877,
286
+ "learning_rate": 0.00015384615384615385,
287
+ "loss": 0.3471,
288
+ "step": 360
289
+ },
290
+ {
291
+ "epoch": 0.9487179487179487,
292
+ "grad_norm": 4.323169231414795,
293
+ "learning_rate": 0.00015256410256410255,
294
+ "loss": 0.2476,
295
+ "step": 370
296
+ },
297
+ {
298
+ "epoch": 0.9743589743589743,
299
+ "grad_norm": 3.7260568141937256,
300
+ "learning_rate": 0.00015128205128205128,
301
+ "loss": 0.41,
302
+ "step": 380
303
+ },
304
+ {
305
+ "epoch": 1.0,
306
+ "grad_norm": 0.22700244188308716,
307
+ "learning_rate": 0.00015000000000000001,
308
+ "loss": 0.3773,
309
+ "step": 390
310
+ },
311
+ {
312
+ "epoch": 1.0256410256410255,
313
+ "grad_norm": 1.8512117862701416,
314
+ "learning_rate": 0.00014871794871794872,
315
+ "loss": 0.3554,
316
+ "step": 400
317
+ },
318
+ {
319
+ "epoch": 1.0256410256410255,
320
+ "eval_accuracy": 0.8009090909090909,
321
+ "eval_loss": 0.4354061186313629,
322
+ "eval_runtime": 24.0034,
323
+ "eval_samples_per_second": 45.827,
324
+ "eval_steps_per_second": 5.749,
325
+ "step": 400
326
+ },
327
+ {
328
+ "epoch": 1.0512820512820513,
329
+ "grad_norm": 1.9039109945297241,
330
+ "learning_rate": 0.00014743589743589745,
331
+ "loss": 0.3033,
332
+ "step": 410
333
+ },
334
+ {
335
+ "epoch": 1.0769230769230769,
336
+ "grad_norm": 2.7765700817108154,
337
+ "learning_rate": 0.00014615384615384615,
338
+ "loss": 0.2594,
339
+ "step": 420
340
+ },
341
+ {
342
+ "epoch": 1.1025641025641026,
343
+ "grad_norm": 2.247612953186035,
344
+ "learning_rate": 0.00014487179487179488,
345
+ "loss": 0.3434,
346
+ "step": 430
347
+ },
348
+ {
349
+ "epoch": 1.1282051282051282,
350
+ "grad_norm": 1.161192536354065,
351
+ "learning_rate": 0.0001435897435897436,
352
+ "loss": 0.2653,
353
+ "step": 440
354
+ },
355
+ {
356
+ "epoch": 1.1538461538461537,
357
+ "grad_norm": 2.9420008659362793,
358
+ "learning_rate": 0.0001423076923076923,
359
+ "loss": 0.3184,
360
+ "step": 450
361
+ },
362
+ {
363
+ "epoch": 1.1794871794871795,
364
+ "grad_norm": 2.359160900115967,
365
+ "learning_rate": 0.00014102564102564104,
366
+ "loss": 0.4049,
367
+ "step": 460
368
+ },
369
+ {
370
+ "epoch": 1.205128205128205,
371
+ "grad_norm": 1.5929157733917236,
372
+ "learning_rate": 0.00013974358974358974,
373
+ "loss": 0.371,
374
+ "step": 470
375
+ },
376
+ {
377
+ "epoch": 1.2307692307692308,
378
+ "grad_norm": 3.8561315536499023,
379
+ "learning_rate": 0.00013846153846153847,
380
+ "loss": 0.2864,
381
+ "step": 480
382
+ },
383
+ {
384
+ "epoch": 1.2564102564102564,
385
+ "grad_norm": 2.111147403717041,
386
+ "learning_rate": 0.00013717948717948718,
387
+ "loss": 0.3173,
388
+ "step": 490
389
+ },
390
+ {
391
+ "epoch": 1.282051282051282,
392
+ "grad_norm": 1.6825300455093384,
393
+ "learning_rate": 0.0001358974358974359,
394
+ "loss": 0.2368,
395
+ "step": 500
396
+ },
397
+ {
398
+ "epoch": 1.282051282051282,
399
+ "eval_accuracy": 0.8309090909090909,
400
+ "eval_loss": 0.4046396017074585,
401
+ "eval_runtime": 24.1395,
402
+ "eval_samples_per_second": 45.568,
403
+ "eval_steps_per_second": 5.717,
404
+ "step": 500
405
+ },
406
+ {
407
+ "epoch": 1.3076923076923077,
408
+ "grad_norm": 1.7883163690567017,
409
+ "learning_rate": 0.00013461538461538464,
410
+ "loss": 0.2616,
411
+ "step": 510
412
+ },
413
+ {
414
+ "epoch": 1.3333333333333333,
415
+ "grad_norm": 3.3474502563476562,
416
+ "learning_rate": 0.00013333333333333334,
417
+ "loss": 0.292,
418
+ "step": 520
419
+ },
420
+ {
421
+ "epoch": 1.358974358974359,
422
+ "grad_norm": 1.9872941970825195,
423
+ "learning_rate": 0.00013205128205128204,
424
+ "loss": 0.3284,
425
+ "step": 530
426
+ },
427
+ {
428
+ "epoch": 1.3846153846153846,
429
+ "grad_norm": 1.508928894996643,
430
+ "learning_rate": 0.00013076923076923077,
431
+ "loss": 0.3132,
432
+ "step": 540
433
+ },
434
+ {
435
+ "epoch": 1.4102564102564101,
436
+ "grad_norm": 2.3678171634674072,
437
+ "learning_rate": 0.0001294871794871795,
438
+ "loss": 0.2698,
439
+ "step": 550
440
+ },
441
+ {
442
+ "epoch": 1.435897435897436,
443
+ "grad_norm": 3.961099147796631,
444
+ "learning_rate": 0.00012820512820512823,
445
+ "loss": 0.2522,
446
+ "step": 560
447
+ },
448
+ {
449
+ "epoch": 1.4615384615384617,
450
+ "grad_norm": 1.7161500453948975,
451
+ "learning_rate": 0.00012692307692307693,
452
+ "loss": 0.3514,
453
+ "step": 570
454
+ },
455
+ {
456
+ "epoch": 1.4871794871794872,
457
+ "grad_norm": 2.0210063457489014,
458
+ "learning_rate": 0.00012564102564102564,
459
+ "loss": 0.2064,
460
+ "step": 580
461
+ },
462
+ {
463
+ "epoch": 1.5128205128205128,
464
+ "grad_norm": 1.9867080450057983,
465
+ "learning_rate": 0.00012435897435897437,
466
+ "loss": 0.3796,
467
+ "step": 590
468
+ },
469
+ {
470
+ "epoch": 1.5384615384615383,
471
+ "grad_norm": 5.6288981437683105,
472
+ "learning_rate": 0.0001230769230769231,
473
+ "loss": 0.3696,
474
+ "step": 600
475
+ },
476
+ {
477
+ "epoch": 1.5384615384615383,
478
+ "eval_accuracy": 0.7809090909090909,
479
+ "eval_loss": 0.5547047257423401,
480
+ "eval_runtime": 22.8194,
481
+ "eval_samples_per_second": 48.205,
482
+ "eval_steps_per_second": 6.047,
483
+ "step": 600
484
+ },
485
+ {
486
+ "epoch": 1.564102564102564,
487
+ "grad_norm": 2.3821280002593994,
488
+ "learning_rate": 0.00012179487179487179,
489
+ "loss": 0.169,
490
+ "step": 610
491
+ },
492
+ {
493
+ "epoch": 1.5897435897435899,
494
+ "grad_norm": 2.7712907791137695,
495
+ "learning_rate": 0.00012051282051282052,
496
+ "loss": 0.294,
497
+ "step": 620
498
+ },
499
+ {
500
+ "epoch": 1.6153846153846154,
501
+ "grad_norm": 2.803150177001953,
502
+ "learning_rate": 0.00011923076923076923,
503
+ "loss": 0.3544,
504
+ "step": 630
505
+ },
506
+ {
507
+ "epoch": 1.641025641025641,
508
+ "grad_norm": 2.660898447036743,
509
+ "learning_rate": 0.00011794871794871796,
510
+ "loss": 0.2811,
511
+ "step": 640
512
+ },
513
+ {
514
+ "epoch": 1.6666666666666665,
515
+ "grad_norm": 1.4722263813018799,
516
+ "learning_rate": 0.00011666666666666668,
517
+ "loss": 0.26,
518
+ "step": 650
519
+ },
520
+ {
521
+ "epoch": 1.6923076923076923,
522
+ "grad_norm": 2.1373977661132812,
523
+ "learning_rate": 0.00011538461538461538,
524
+ "loss": 0.2282,
525
+ "step": 660
526
+ },
527
+ {
528
+ "epoch": 1.717948717948718,
529
+ "grad_norm": 4.172289848327637,
530
+ "learning_rate": 0.0001141025641025641,
531
+ "loss": 0.2673,
532
+ "step": 670
533
+ },
534
+ {
535
+ "epoch": 1.7435897435897436,
536
+ "grad_norm": 1.547676682472229,
537
+ "learning_rate": 0.00011282051282051283,
538
+ "loss": 0.234,
539
+ "step": 680
540
+ },
541
+ {
542
+ "epoch": 1.7692307692307692,
543
+ "grad_norm": 1.9279000759124756,
544
+ "learning_rate": 0.00011153846153846154,
545
+ "loss": 0.4459,
546
+ "step": 690
547
+ },
548
+ {
549
+ "epoch": 1.7948717948717947,
550
+ "grad_norm": 2.7669806480407715,
551
+ "learning_rate": 0.00011025641025641027,
552
+ "loss": 0.2824,
553
+ "step": 700
554
+ },
555
+ {
556
+ "epoch": 1.7948717948717947,
557
+ "eval_accuracy": 0.8518181818181818,
558
+ "eval_loss": 0.33288896083831787,
559
+ "eval_runtime": 23.8646,
560
+ "eval_samples_per_second": 46.093,
561
+ "eval_steps_per_second": 5.783,
562
+ "step": 700
563
+ },
564
+ {
565
+ "epoch": 1.8205128205128205,
566
+ "grad_norm": 1.626448154449463,
567
+ "learning_rate": 0.00010897435897435896,
568
+ "loss": 0.2789,
569
+ "step": 710
570
+ },
571
+ {
572
+ "epoch": 1.8461538461538463,
573
+ "grad_norm": 2.5008246898651123,
574
+ "learning_rate": 0.0001076923076923077,
575
+ "loss": 0.2939,
576
+ "step": 720
577
+ },
578
+ {
579
+ "epoch": 1.8717948717948718,
580
+ "grad_norm": 1.4484879970550537,
581
+ "learning_rate": 0.00010641025641025641,
582
+ "loss": 0.3107,
583
+ "step": 730
584
+ },
585
+ {
586
+ "epoch": 1.8974358974358974,
587
+ "grad_norm": 2.9797451496124268,
588
+ "learning_rate": 0.00010512820512820514,
589
+ "loss": 0.1989,
590
+ "step": 740
591
+ },
592
+ {
593
+ "epoch": 1.9230769230769231,
594
+ "grad_norm": 2.551682710647583,
595
+ "learning_rate": 0.00010384615384615386,
596
+ "loss": 0.3133,
597
+ "step": 750
598
+ },
599
+ {
600
+ "epoch": 1.9487179487179487,
601
+ "grad_norm": 3.318741798400879,
602
+ "learning_rate": 0.00010256410256410256,
603
+ "loss": 0.2384,
604
+ "step": 760
605
+ },
606
+ {
607
+ "epoch": 1.9743589743589745,
608
+ "grad_norm": 0.9309015274047852,
609
+ "learning_rate": 0.00010128205128205129,
610
+ "loss": 0.2314,
611
+ "step": 770
612
+ },
613
+ {
614
+ "epoch": 2.0,
615
+ "grad_norm": 3.10331392288208,
616
+ "learning_rate": 0.0001,
617
+ "loss": 0.2341,
618
+ "step": 780
619
+ },
620
+ {
621
+ "epoch": 2.0256410256410255,
622
+ "grad_norm": 2.6792232990264893,
623
+ "learning_rate": 9.871794871794872e-05,
624
+ "loss": 0.1841,
625
+ "step": 790
626
+ },
627
+ {
628
+ "epoch": 2.051282051282051,
629
+ "grad_norm": 2.074448585510254,
630
+ "learning_rate": 9.743589743589744e-05,
631
+ "loss": 0.2366,
632
+ "step": 800
633
+ },
634
+ {
635
+ "epoch": 2.051282051282051,
636
+ "eval_accuracy": 0.8254545454545454,
637
+ "eval_loss": 0.45823609828948975,
638
+ "eval_runtime": 24.0195,
639
+ "eval_samples_per_second": 45.796,
640
+ "eval_steps_per_second": 5.745,
641
+ "step": 800
642
+ },
643
+ {
644
+ "epoch": 2.076923076923077,
645
+ "grad_norm": 0.9467771649360657,
646
+ "learning_rate": 9.615384615384617e-05,
647
+ "loss": 0.1895,
648
+ "step": 810
649
+ },
650
+ {
651
+ "epoch": 2.1025641025641026,
652
+ "grad_norm": 3.1332082748413086,
653
+ "learning_rate": 9.487179487179487e-05,
654
+ "loss": 0.2665,
655
+ "step": 820
656
+ },
657
+ {
658
+ "epoch": 2.128205128205128,
659
+ "grad_norm": 3.9276282787323,
660
+ "learning_rate": 9.35897435897436e-05,
661
+ "loss": 0.2388,
662
+ "step": 830
663
+ },
664
+ {
665
+ "epoch": 2.1538461538461537,
666
+ "grad_norm": 2.7033755779266357,
667
+ "learning_rate": 9.230769230769232e-05,
668
+ "loss": 0.1917,
669
+ "step": 840
670
+ },
671
+ {
672
+ "epoch": 2.1794871794871793,
673
+ "grad_norm": 1.5250920057296753,
674
+ "learning_rate": 9.102564102564103e-05,
675
+ "loss": 0.3245,
676
+ "step": 850
677
+ },
678
+ {
679
+ "epoch": 2.2051282051282053,
680
+ "grad_norm": 1.5101457834243774,
681
+ "learning_rate": 8.974358974358975e-05,
682
+ "loss": 0.1377,
683
+ "step": 860
684
+ },
685
+ {
686
+ "epoch": 2.230769230769231,
687
+ "grad_norm": 0.743198573589325,
688
+ "learning_rate": 8.846153846153847e-05,
689
+ "loss": 0.2243,
690
+ "step": 870
691
+ },
692
+ {
693
+ "epoch": 2.2564102564102564,
694
+ "grad_norm": 5.429717540740967,
695
+ "learning_rate": 8.717948717948718e-05,
696
+ "loss": 0.2683,
697
+ "step": 880
698
+ },
699
+ {
700
+ "epoch": 2.282051282051282,
701
+ "grad_norm": 2.3276283740997314,
702
+ "learning_rate": 8.58974358974359e-05,
703
+ "loss": 0.2083,
704
+ "step": 890
705
+ },
706
+ {
707
+ "epoch": 2.3076923076923075,
708
+ "grad_norm": 1.3464454412460327,
709
+ "learning_rate": 8.461538461538461e-05,
710
+ "loss": 0.2212,
711
+ "step": 900
712
+ },
713
+ {
714
+ "epoch": 2.3076923076923075,
715
+ "eval_accuracy": 0.8254545454545454,
716
+ "eval_loss": 0.4885379374027252,
717
+ "eval_runtime": 24.2062,
718
+ "eval_samples_per_second": 45.443,
719
+ "eval_steps_per_second": 5.701,
720
+ "step": 900
721
+ },
722
+ {
723
+ "epoch": 2.3333333333333335,
724
+ "grad_norm": 2.0947823524475098,
725
+ "learning_rate": 8.333333333333334e-05,
726
+ "loss": 0.1469,
727
+ "step": 910
728
+ },
729
+ {
730
+ "epoch": 2.358974358974359,
731
+ "grad_norm": 2.0833053588867188,
732
+ "learning_rate": 8.205128205128205e-05,
733
+ "loss": 0.1637,
734
+ "step": 920
735
+ },
736
+ {
737
+ "epoch": 2.3846153846153846,
738
+ "grad_norm": 3.5598833560943604,
739
+ "learning_rate": 8.076923076923078e-05,
740
+ "loss": 0.2127,
741
+ "step": 930
742
+ },
743
+ {
744
+ "epoch": 2.41025641025641,
745
+ "grad_norm": 5.30457878112793,
746
+ "learning_rate": 7.948717948717948e-05,
747
+ "loss": 0.1561,
748
+ "step": 940
749
+ },
750
+ {
751
+ "epoch": 2.435897435897436,
752
+ "grad_norm": 2.163148880004883,
753
+ "learning_rate": 7.820512820512821e-05,
754
+ "loss": 0.1539,
755
+ "step": 950
756
+ },
757
+ {
758
+ "epoch": 2.4615384615384617,
759
+ "grad_norm": 4.815582752227783,
760
+ "learning_rate": 7.692307692307693e-05,
761
+ "loss": 0.1802,
762
+ "step": 960
763
+ },
764
+ {
765
+ "epoch": 2.4871794871794872,
766
+ "grad_norm": 1.3422257900238037,
767
+ "learning_rate": 7.564102564102564e-05,
768
+ "loss": 0.2059,
769
+ "step": 970
770
+ },
771
+ {
772
+ "epoch": 2.5128205128205128,
773
+ "grad_norm": 2.441047430038452,
774
+ "learning_rate": 7.435897435897436e-05,
775
+ "loss": 0.2389,
776
+ "step": 980
777
+ },
778
+ {
779
+ "epoch": 2.5384615384615383,
780
+ "grad_norm": 3.386566162109375,
781
+ "learning_rate": 7.307692307692307e-05,
782
+ "loss": 0.1882,
783
+ "step": 990
784
+ },
785
+ {
786
+ "epoch": 2.564102564102564,
787
+ "grad_norm": 3.123497247695923,
788
+ "learning_rate": 7.17948717948718e-05,
789
+ "loss": 0.2031,
790
+ "step": 1000
791
+ },
792
+ {
793
+ "epoch": 2.564102564102564,
794
+ "eval_accuracy": 0.8563636363636363,
795
+ "eval_loss": 0.42818111181259155,
796
+ "eval_runtime": 24.1992,
797
+ "eval_samples_per_second": 45.456,
798
+ "eval_steps_per_second": 5.703,
799
+ "step": 1000
800
+ },
801
+ {
802
+ "epoch": 2.58974358974359,
803
+ "grad_norm": 2.518524646759033,
804
+ "learning_rate": 7.051282051282052e-05,
805
+ "loss": 0.2551,
806
+ "step": 1010
807
+ },
808
+ {
809
+ "epoch": 2.6153846153846154,
810
+ "grad_norm": 2.376194715499878,
811
+ "learning_rate": 6.923076923076924e-05,
812
+ "loss": 0.1853,
813
+ "step": 1020
814
+ },
815
+ {
816
+ "epoch": 2.641025641025641,
817
+ "grad_norm": 0.8307498097419739,
818
+ "learning_rate": 6.794871794871795e-05,
819
+ "loss": 0.1643,
820
+ "step": 1030
821
+ },
822
+ {
823
+ "epoch": 2.6666666666666665,
824
+ "grad_norm": 2.050661087036133,
825
+ "learning_rate": 6.666666666666667e-05,
826
+ "loss": 0.105,
827
+ "step": 1040
828
+ },
829
+ {
830
+ "epoch": 2.6923076923076925,
831
+ "grad_norm": 2.984266996383667,
832
+ "learning_rate": 6.538461538461539e-05,
833
+ "loss": 0.1774,
834
+ "step": 1050
835
+ },
836
+ {
837
+ "epoch": 2.717948717948718,
838
+ "grad_norm": 3.933162212371826,
839
+ "learning_rate": 6.410256410256412e-05,
840
+ "loss": 0.1079,
841
+ "step": 1060
842
+ },
843
+ {
844
+ "epoch": 2.7435897435897436,
845
+ "grad_norm": 4.650693893432617,
846
+ "learning_rate": 6.282051282051282e-05,
847
+ "loss": 0.1542,
848
+ "step": 1070
849
+ },
850
+ {
851
+ "epoch": 2.769230769230769,
852
+ "grad_norm": 2.796116828918457,
853
+ "learning_rate": 6.153846153846155e-05,
854
+ "loss": 0.4463,
855
+ "step": 1080
856
+ },
857
+ {
858
+ "epoch": 2.7948717948717947,
859
+ "grad_norm": 3.1699883937835693,
860
+ "learning_rate": 6.025641025641026e-05,
861
+ "loss": 0.1348,
862
+ "step": 1090
863
+ },
864
+ {
865
+ "epoch": 2.8205128205128203,
866
+ "grad_norm": 2.0635199546813965,
867
+ "learning_rate": 5.897435897435898e-05,
868
+ "loss": 0.1717,
869
+ "step": 1100
870
+ },
871
+ {
872
+ "epoch": 2.8205128205128203,
873
+ "eval_accuracy": 0.85,
874
+ "eval_loss": 0.4373130798339844,
875
+ "eval_runtime": 23.6976,
876
+ "eval_samples_per_second": 46.418,
877
+ "eval_steps_per_second": 5.823,
878
+ "step": 1100
879
+ },
880
+ {
881
+ "epoch": 2.8461538461538463,
882
+ "grad_norm": 3.7501204013824463,
883
+ "learning_rate": 5.769230769230769e-05,
884
+ "loss": 0.1505,
885
+ "step": 1110
886
+ },
887
+ {
888
+ "epoch": 2.871794871794872,
889
+ "grad_norm": 0.866908609867096,
890
+ "learning_rate": 5.6410256410256414e-05,
891
+ "loss": 0.131,
892
+ "step": 1120
893
+ },
894
+ {
895
+ "epoch": 2.8974358974358974,
896
+ "grad_norm": 2.7631490230560303,
897
+ "learning_rate": 5.512820512820514e-05,
898
+ "loss": 0.1067,
899
+ "step": 1130
900
+ },
901
+ {
902
+ "epoch": 2.9230769230769234,
903
+ "grad_norm": 0.8835192918777466,
904
+ "learning_rate": 5.384615384615385e-05,
905
+ "loss": 0.2449,
906
+ "step": 1140
907
+ },
908
+ {
909
+ "epoch": 2.948717948717949,
910
+ "grad_norm": 0.17269015312194824,
911
+ "learning_rate": 5.256410256410257e-05,
912
+ "loss": 0.1235,
913
+ "step": 1150
914
+ },
915
+ {
916
+ "epoch": 2.9743589743589745,
917
+ "grad_norm": 2.5380775928497314,
918
+ "learning_rate": 5.128205128205128e-05,
919
+ "loss": 0.1214,
920
+ "step": 1160
921
+ },
922
+ {
923
+ "epoch": 3.0,
924
+ "grad_norm": 9.240225791931152,
925
+ "learning_rate": 5e-05,
926
+ "loss": 0.2372,
927
+ "step": 1170
928
+ },
929
+ {
930
+ "epoch": 3.0256410256410255,
931
+ "grad_norm": 0.454428106546402,
932
+ "learning_rate": 4.871794871794872e-05,
933
+ "loss": 0.1121,
934
+ "step": 1180
935
+ },
936
+ {
937
+ "epoch": 3.051282051282051,
938
+ "grad_norm": 3.3110735416412354,
939
+ "learning_rate": 4.7435897435897435e-05,
940
+ "loss": 0.1121,
941
+ "step": 1190
942
+ },
943
+ {
944
+ "epoch": 3.076923076923077,
945
+ "grad_norm": 0.4833953380584717,
946
+ "learning_rate": 4.615384615384616e-05,
947
+ "loss": 0.1303,
948
+ "step": 1200
949
+ },
950
+ {
951
+ "epoch": 3.076923076923077,
952
+ "eval_accuracy": 0.8718181818181818,
953
+ "eval_loss": 0.36585894227027893,
954
+ "eval_runtime": 24.2959,
955
+ "eval_samples_per_second": 45.275,
956
+ "eval_steps_per_second": 5.68,
957
+ "step": 1200
958
+ },
959
+ {
960
+ "epoch": 3.1025641025641026,
961
+ "grad_norm": 0.08100098371505737,
962
+ "learning_rate": 4.4871794871794874e-05,
963
+ "loss": 0.0911,
964
+ "step": 1210
965
+ },
966
+ {
967
+ "epoch": 3.128205128205128,
968
+ "grad_norm": 0.30585813522338867,
969
+ "learning_rate": 4.358974358974359e-05,
970
+ "loss": 0.0834,
971
+ "step": 1220
972
+ },
973
+ {
974
+ "epoch": 3.1538461538461537,
975
+ "grad_norm": 4.129181385040283,
976
+ "learning_rate": 4.230769230769231e-05,
977
+ "loss": 0.1144,
978
+ "step": 1230
979
+ },
980
+ {
981
+ "epoch": 3.1794871794871793,
982
+ "grad_norm": 0.367727667093277,
983
+ "learning_rate": 4.1025641025641023e-05,
984
+ "loss": 0.0808,
985
+ "step": 1240
986
+ },
987
+ {
988
+ "epoch": 3.2051282051282053,
989
+ "grad_norm": 0.10303868353366852,
990
+ "learning_rate": 3.974358974358974e-05,
991
+ "loss": 0.1758,
992
+ "step": 1250
993
+ },
994
+ {
995
+ "epoch": 3.230769230769231,
996
+ "grad_norm": 2.300645589828491,
997
+ "learning_rate": 3.846153846153846e-05,
998
+ "loss": 0.227,
999
+ "step": 1260
1000
+ },
1001
+ {
1002
+ "epoch": 3.2564102564102564,
1003
+ "grad_norm": 1.345780372619629,
1004
+ "learning_rate": 3.717948717948718e-05,
1005
+ "loss": 0.1345,
1006
+ "step": 1270
1007
+ },
1008
+ {
1009
+ "epoch": 3.282051282051282,
1010
+ "grad_norm": 2.5391829013824463,
1011
+ "learning_rate": 3.58974358974359e-05,
1012
+ "loss": 0.0496,
1013
+ "step": 1280
1014
+ },
1015
+ {
1016
+ "epoch": 3.3076923076923075,
1017
+ "grad_norm": 0.31912463903427124,
1018
+ "learning_rate": 3.461538461538462e-05,
1019
+ "loss": 0.1165,
1020
+ "step": 1290
1021
+ },
1022
+ {
1023
+ "epoch": 3.3333333333333335,
1024
+ "grad_norm": 0.5431106686592102,
1025
+ "learning_rate": 3.3333333333333335e-05,
1026
+ "loss": 0.0889,
1027
+ "step": 1300
1028
+ },
1029
+ {
1030
+ "epoch": 3.3333333333333335,
1031
+ "eval_accuracy": 0.8736363636363637,
1032
+ "eval_loss": 0.3662668764591217,
1033
+ "eval_runtime": 23.4444,
1034
+ "eval_samples_per_second": 46.92,
1035
+ "eval_steps_per_second": 5.886,
1036
+ "step": 1300
1037
+ },
1038
+ {
1039
+ "epoch": 3.358974358974359,
1040
+ "grad_norm": 2.443268299102783,
1041
+ "learning_rate": 3.205128205128206e-05,
1042
+ "loss": 0.1256,
1043
+ "step": 1310
1044
+ },
1045
+ {
1046
+ "epoch": 3.3846153846153846,
1047
+ "grad_norm": 2.0804026126861572,
1048
+ "learning_rate": 3.0769230769230774e-05,
1049
+ "loss": 0.0973,
1050
+ "step": 1320
1051
+ },
1052
+ {
1053
+ "epoch": 3.41025641025641,
1054
+ "grad_norm": 10.397607803344727,
1055
+ "learning_rate": 2.948717948717949e-05,
1056
+ "loss": 0.1183,
1057
+ "step": 1330
1058
+ },
1059
+ {
1060
+ "epoch": 3.435897435897436,
1061
+ "grad_norm": 3.746250867843628,
1062
+ "learning_rate": 2.8205128205128207e-05,
1063
+ "loss": 0.046,
1064
+ "step": 1340
1065
+ },
1066
+ {
1067
+ "epoch": 3.4615384615384617,
1068
+ "grad_norm": 0.7118757367134094,
1069
+ "learning_rate": 2.6923076923076923e-05,
1070
+ "loss": 0.1611,
1071
+ "step": 1350
1072
+ },
1073
+ {
1074
+ "epoch": 3.4871794871794872,
1075
+ "grad_norm": 0.34771645069122314,
1076
+ "learning_rate": 2.564102564102564e-05,
1077
+ "loss": 0.1974,
1078
+ "step": 1360
1079
+ },
1080
+ {
1081
+ "epoch": 3.5128205128205128,
1082
+ "grad_norm": 6.590170860290527,
1083
+ "learning_rate": 2.435897435897436e-05,
1084
+ "loss": 0.1392,
1085
+ "step": 1370
1086
+ },
1087
+ {
1088
+ "epoch": 3.5384615384615383,
1089
+ "grad_norm": 3.6979663372039795,
1090
+ "learning_rate": 2.307692307692308e-05,
1091
+ "loss": 0.1153,
1092
+ "step": 1380
1093
+ },
1094
+ {
1095
+ "epoch": 3.564102564102564,
1096
+ "grad_norm": 0.12197946012020111,
1097
+ "learning_rate": 2.1794871794871795e-05,
1098
+ "loss": 0.1027,
1099
+ "step": 1390
1100
+ },
1101
+ {
1102
+ "epoch": 3.58974358974359,
1103
+ "grad_norm": 2.5246639251708984,
1104
+ "learning_rate": 2.0512820512820512e-05,
1105
+ "loss": 0.1157,
1106
+ "step": 1400
1107
+ },
1108
+ {
1109
+ "epoch": 3.58974358974359,
1110
+ "eval_accuracy": 0.8436363636363636,
1111
+ "eval_loss": 0.4587700366973877,
1112
+ "eval_runtime": 22.8201,
1113
+ "eval_samples_per_second": 48.203,
1114
+ "eval_steps_per_second": 6.047,
1115
+ "step": 1400
1116
+ },
1117
+ {
1118
+ "epoch": 3.6153846153846154,
1119
+ "grad_norm": 0.37446674704551697,
1120
+ "learning_rate": 1.923076923076923e-05,
1121
+ "loss": 0.0839,
1122
+ "step": 1410
1123
+ },
1124
+ {
1125
+ "epoch": 3.641025641025641,
1126
+ "grad_norm": 0.7361642718315125,
1127
+ "learning_rate": 1.794871794871795e-05,
1128
+ "loss": 0.0541,
1129
+ "step": 1420
1130
+ },
1131
+ {
1132
+ "epoch": 3.6666666666666665,
1133
+ "grad_norm": 0.11162062734365463,
1134
+ "learning_rate": 1.6666666666666667e-05,
1135
+ "loss": 0.1791,
1136
+ "step": 1430
1137
+ },
1138
+ {
1139
+ "epoch": 3.6923076923076925,
1140
+ "grad_norm": 3.2151377201080322,
1141
+ "learning_rate": 1.5384615384615387e-05,
1142
+ "loss": 0.0597,
1143
+ "step": 1440
1144
+ },
1145
+ {
1146
+ "epoch": 3.717948717948718,
1147
+ "grad_norm": 0.853471040725708,
1148
+ "learning_rate": 1.4102564102564104e-05,
1149
+ "loss": 0.1246,
1150
+ "step": 1450
1151
+ },
1152
+ {
1153
+ "epoch": 3.7435897435897436,
1154
+ "grad_norm": 0.2989501953125,
1155
+ "learning_rate": 1.282051282051282e-05,
1156
+ "loss": 0.0747,
1157
+ "step": 1460
1158
+ },
1159
+ {
1160
+ "epoch": 3.769230769230769,
1161
+ "grad_norm": 0.4194205403327942,
1162
+ "learning_rate": 1.153846153846154e-05,
1163
+ "loss": 0.078,
1164
+ "step": 1470
1165
+ },
1166
+ {
1167
+ "epoch": 3.7948717948717947,
1168
+ "grad_norm": 0.2623525857925415,
1169
+ "learning_rate": 1.0256410256410256e-05,
1170
+ "loss": 0.064,
1171
+ "step": 1480
1172
+ },
1173
+ {
1174
+ "epoch": 3.8205128205128203,
1175
+ "grad_norm": 1.1962109804153442,
1176
+ "learning_rate": 8.974358974358976e-06,
1177
+ "loss": 0.0955,
1178
+ "step": 1490
1179
+ },
1180
+ {
1181
+ "epoch": 3.8461538461538463,
1182
+ "grad_norm": 2.009432792663574,
1183
+ "learning_rate": 7.692307692307694e-06,
1184
+ "loss": 0.1215,
1185
+ "step": 1500
1186
+ },
1187
+ {
1188
+ "epoch": 3.8461538461538463,
1189
+ "eval_accuracy": 0.8654545454545455,
1190
+ "eval_loss": 0.43503817915916443,
1191
+ "eval_runtime": 23.6622,
1192
+ "eval_samples_per_second": 46.488,
1193
+ "eval_steps_per_second": 5.832,
1194
+ "step": 1500
1195
+ },
1196
+ {
1197
+ "epoch": 3.871794871794872,
1198
+ "grad_norm": 3.284787178039551,
1199
+ "learning_rate": 6.41025641025641e-06,
1200
+ "loss": 0.0614,
1201
+ "step": 1510
1202
+ },
1203
+ {
1204
+ "epoch": 3.8974358974358974,
1205
+ "grad_norm": 0.1390266716480255,
1206
+ "learning_rate": 5.128205128205128e-06,
1207
+ "loss": 0.0795,
1208
+ "step": 1520
1209
+ },
1210
+ {
1211
+ "epoch": 3.9230769230769234,
1212
+ "grad_norm": 3.4633984565734863,
1213
+ "learning_rate": 3.846153846153847e-06,
1214
+ "loss": 0.1268,
1215
+ "step": 1530
1216
+ },
1217
+ {
1218
+ "epoch": 3.948717948717949,
1219
+ "grad_norm": 3.78682804107666,
1220
+ "learning_rate": 2.564102564102564e-06,
1221
+ "loss": 0.1049,
1222
+ "step": 1540
1223
+ },
1224
+ {
1225
+ "epoch": 3.9743589743589745,
1226
+ "grad_norm": 3.6551170349121094,
1227
+ "learning_rate": 1.282051282051282e-06,
1228
+ "loss": 0.0924,
1229
+ "step": 1550
1230
+ },
1231
+ {
1232
+ "epoch": 4.0,
1233
+ "grad_norm": 0.1017698347568512,
1234
+ "learning_rate": 0.0,
1235
+ "loss": 0.1392,
1236
+ "step": 1560
1237
+ },
1238
+ {
1239
+ "epoch": 4.0,
1240
+ "step": 1560,
1241
+ "total_flos": 1.9301704773202575e+18,
1242
+ "train_loss": 0.266804637053074,
1243
+ "train_runtime": 1157.865,
1244
+ "train_samples_per_second": 21.512,
1245
+ "train_steps_per_second": 1.347
1246
+ }
1247
+ ],
1248
+ "logging_steps": 10,
1249
+ "max_steps": 1560,
1250
+ "num_input_tokens_seen": 0,
1251
+ "num_train_epochs": 4,
1252
+ "save_steps": 100,
1253
+ "stateful_callbacks": {
1254
+ "TrainerControl": {
1255
+ "args": {
1256
+ "should_epoch_stop": false,
1257
+ "should_evaluate": false,
1258
+ "should_log": false,
1259
+ "should_save": true,
1260
+ "should_training_stop": true
1261
+ },
1262
+ "attributes": {}
1263
+ }
1264
+ },
1265
+ "total_flos": 1.9301704773202575e+18,
1266
+ "train_batch_size": 16,
1267
+ "trial_name": null,
1268
+ "trial_params": null
1269
+ }