PooryaPiroozfar commited on
Commit
807b87e
1 Parent(s): 3450d25

new version of Persian NER

Browse files
Files changed (1) hide show
  1. training.log +735 -0
training.log ADDED
@@ -0,0 +1,735 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-01-08 08:23:21,495 ----------------------------------------------------------------------------------------------------
2
+ 2023-01-08 08:23:21,498 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): BertModel(
5
+ (embeddings): BertEmbeddings(
6
+ (word_embeddings): Embedding(100000, 768, padding_idx=0)
7
+ (position_embeddings): Embedding(512, 768)
8
+ (token_type_embeddings): Embedding(2, 768)
9
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): BertEncoder(
13
+ (layer): ModuleList(
14
+ (0): BertLayer(
15
+ (attention): BertAttention(
16
+ (self): BertSelfAttention(
17
+ (query): Linear(in_features=768, out_features=768, bias=True)
18
+ (key): Linear(in_features=768, out_features=768, bias=True)
19
+ (value): Linear(in_features=768, out_features=768, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): BertSelfOutput(
23
+ (dense): Linear(in_features=768, out_features=768, bias=True)
24
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): BertIntermediate(
29
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
30
+ (intermediate_act_fn): GELUActivation()
31
+ )
32
+ (output): BertOutput(
33
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
34
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
35
+ (dropout): Dropout(p=0.1, inplace=False)
36
+ )
37
+ )
38
+ (1): BertLayer(
39
+ (attention): BertAttention(
40
+ (self): BertSelfAttention(
41
+ (query): Linear(in_features=768, out_features=768, bias=True)
42
+ (key): Linear(in_features=768, out_features=768, bias=True)
43
+ (value): Linear(in_features=768, out_features=768, bias=True)
44
+ (dropout): Dropout(p=0.1, inplace=False)
45
+ )
46
+ (output): BertSelfOutput(
47
+ (dense): Linear(in_features=768, out_features=768, bias=True)
48
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
49
+ (dropout): Dropout(p=0.1, inplace=False)
50
+ )
51
+ )
52
+ (intermediate): BertIntermediate(
53
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
54
+ (intermediate_act_fn): GELUActivation()
55
+ )
56
+ (output): BertOutput(
57
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
58
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
59
+ (dropout): Dropout(p=0.1, inplace=False)
60
+ )
61
+ )
62
+ (2): BertLayer(
63
+ (attention): BertAttention(
64
+ (self): BertSelfAttention(
65
+ (query): Linear(in_features=768, out_features=768, bias=True)
66
+ (key): Linear(in_features=768, out_features=768, bias=True)
67
+ (value): Linear(in_features=768, out_features=768, bias=True)
68
+ (dropout): Dropout(p=0.1, inplace=False)
69
+ )
70
+ (output): BertSelfOutput(
71
+ (dense): Linear(in_features=768, out_features=768, bias=True)
72
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
73
+ (dropout): Dropout(p=0.1, inplace=False)
74
+ )
75
+ )
76
+ (intermediate): BertIntermediate(
77
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
78
+ (intermediate_act_fn): GELUActivation()
79
+ )
80
+ (output): BertOutput(
81
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
82
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
83
+ (dropout): Dropout(p=0.1, inplace=False)
84
+ )
85
+ )
86
+ (3): BertLayer(
87
+ (attention): BertAttention(
88
+ (self): BertSelfAttention(
89
+ (query): Linear(in_features=768, out_features=768, bias=True)
90
+ (key): Linear(in_features=768, out_features=768, bias=True)
91
+ (value): Linear(in_features=768, out_features=768, bias=True)
92
+ (dropout): Dropout(p=0.1, inplace=False)
93
+ )
94
+ (output): BertSelfOutput(
95
+ (dense): Linear(in_features=768, out_features=768, bias=True)
96
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
97
+ (dropout): Dropout(p=0.1, inplace=False)
98
+ )
99
+ )
100
+ (intermediate): BertIntermediate(
101
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
102
+ (intermediate_act_fn): GELUActivation()
103
+ )
104
+ (output): BertOutput(
105
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
106
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
107
+ (dropout): Dropout(p=0.1, inplace=False)
108
+ )
109
+ )
110
+ (4): BertLayer(
111
+ (attention): BertAttention(
112
+ (self): BertSelfAttention(
113
+ (query): Linear(in_features=768, out_features=768, bias=True)
114
+ (key): Linear(in_features=768, out_features=768, bias=True)
115
+ (value): Linear(in_features=768, out_features=768, bias=True)
116
+ (dropout): Dropout(p=0.1, inplace=False)
117
+ )
118
+ (output): BertSelfOutput(
119
+ (dense): Linear(in_features=768, out_features=768, bias=True)
120
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
121
+ (dropout): Dropout(p=0.1, inplace=False)
122
+ )
123
+ )
124
+ (intermediate): BertIntermediate(
125
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
126
+ (intermediate_act_fn): GELUActivation()
127
+ )
128
+ (output): BertOutput(
129
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
130
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
131
+ (dropout): Dropout(p=0.1, inplace=False)
132
+ )
133
+ )
134
+ (5): BertLayer(
135
+ (attention): BertAttention(
136
+ (self): BertSelfAttention(
137
+ (query): Linear(in_features=768, out_features=768, bias=True)
138
+ (key): Linear(in_features=768, out_features=768, bias=True)
139
+ (value): Linear(in_features=768, out_features=768, bias=True)
140
+ (dropout): Dropout(p=0.1, inplace=False)
141
+ )
142
+ (output): BertSelfOutput(
143
+ (dense): Linear(in_features=768, out_features=768, bias=True)
144
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
145
+ (dropout): Dropout(p=0.1, inplace=False)
146
+ )
147
+ )
148
+ (intermediate): BertIntermediate(
149
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
150
+ (intermediate_act_fn): GELUActivation()
151
+ )
152
+ (output): BertOutput(
153
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
154
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
155
+ (dropout): Dropout(p=0.1, inplace=False)
156
+ )
157
+ )
158
+ (6): BertLayer(
159
+ (attention): BertAttention(
160
+ (self): BertSelfAttention(
161
+ (query): Linear(in_features=768, out_features=768, bias=True)
162
+ (key): Linear(in_features=768, out_features=768, bias=True)
163
+ (value): Linear(in_features=768, out_features=768, bias=True)
164
+ (dropout): Dropout(p=0.1, inplace=False)
165
+ )
166
+ (output): BertSelfOutput(
167
+ (dense): Linear(in_features=768, out_features=768, bias=True)
168
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
169
+ (dropout): Dropout(p=0.1, inplace=False)
170
+ )
171
+ )
172
+ (intermediate): BertIntermediate(
173
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
174
+ (intermediate_act_fn): GELUActivation()
175
+ )
176
+ (output): BertOutput(
177
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
178
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
179
+ (dropout): Dropout(p=0.1, inplace=False)
180
+ )
181
+ )
182
+ (7): BertLayer(
183
+ (attention): BertAttention(
184
+ (self): BertSelfAttention(
185
+ (query): Linear(in_features=768, out_features=768, bias=True)
186
+ (key): Linear(in_features=768, out_features=768, bias=True)
187
+ (value): Linear(in_features=768, out_features=768, bias=True)
188
+ (dropout): Dropout(p=0.1, inplace=False)
189
+ )
190
+ (output): BertSelfOutput(
191
+ (dense): Linear(in_features=768, out_features=768, bias=True)
192
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
193
+ (dropout): Dropout(p=0.1, inplace=False)
194
+ )
195
+ )
196
+ (intermediate): BertIntermediate(
197
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
198
+ (intermediate_act_fn): GELUActivation()
199
+ )
200
+ (output): BertOutput(
201
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
202
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
203
+ (dropout): Dropout(p=0.1, inplace=False)
204
+ )
205
+ )
206
+ (8): BertLayer(
207
+ (attention): BertAttention(
208
+ (self): BertSelfAttention(
209
+ (query): Linear(in_features=768, out_features=768, bias=True)
210
+ (key): Linear(in_features=768, out_features=768, bias=True)
211
+ (value): Linear(in_features=768, out_features=768, bias=True)
212
+ (dropout): Dropout(p=0.1, inplace=False)
213
+ )
214
+ (output): BertSelfOutput(
215
+ (dense): Linear(in_features=768, out_features=768, bias=True)
216
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
217
+ (dropout): Dropout(p=0.1, inplace=False)
218
+ )
219
+ )
220
+ (intermediate): BertIntermediate(
221
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
222
+ (intermediate_act_fn): GELUActivation()
223
+ )
224
+ (output): BertOutput(
225
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
226
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
227
+ (dropout): Dropout(p=0.1, inplace=False)
228
+ )
229
+ )
230
+ (9): BertLayer(
231
+ (attention): BertAttention(
232
+ (self): BertSelfAttention(
233
+ (query): Linear(in_features=768, out_features=768, bias=True)
234
+ (key): Linear(in_features=768, out_features=768, bias=True)
235
+ (value): Linear(in_features=768, out_features=768, bias=True)
236
+ (dropout): Dropout(p=0.1, inplace=False)
237
+ )
238
+ (output): BertSelfOutput(
239
+ (dense): Linear(in_features=768, out_features=768, bias=True)
240
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
241
+ (dropout): Dropout(p=0.1, inplace=False)
242
+ )
243
+ )
244
+ (intermediate): BertIntermediate(
245
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
246
+ (intermediate_act_fn): GELUActivation()
247
+ )
248
+ (output): BertOutput(
249
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
250
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
251
+ (dropout): Dropout(p=0.1, inplace=False)
252
+ )
253
+ )
254
+ (10): BertLayer(
255
+ (attention): BertAttention(
256
+ (self): BertSelfAttention(
257
+ (query): Linear(in_features=768, out_features=768, bias=True)
258
+ (key): Linear(in_features=768, out_features=768, bias=True)
259
+ (value): Linear(in_features=768, out_features=768, bias=True)
260
+ (dropout): Dropout(p=0.1, inplace=False)
261
+ )
262
+ (output): BertSelfOutput(
263
+ (dense): Linear(in_features=768, out_features=768, bias=True)
264
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
265
+ (dropout): Dropout(p=0.1, inplace=False)
266
+ )
267
+ )
268
+ (intermediate): BertIntermediate(
269
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
270
+ (intermediate_act_fn): GELUActivation()
271
+ )
272
+ (output): BertOutput(
273
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
274
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
275
+ (dropout): Dropout(p=0.1, inplace=False)
276
+ )
277
+ )
278
+ (11): BertLayer(
279
+ (attention): BertAttention(
280
+ (self): BertSelfAttention(
281
+ (query): Linear(in_features=768, out_features=768, bias=True)
282
+ (key): Linear(in_features=768, out_features=768, bias=True)
283
+ (value): Linear(in_features=768, out_features=768, bias=True)
284
+ (dropout): Dropout(p=0.1, inplace=False)
285
+ )
286
+ (output): BertSelfOutput(
287
+ (dense): Linear(in_features=768, out_features=768, bias=True)
288
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
289
+ (dropout): Dropout(p=0.1, inplace=False)
290
+ )
291
+ )
292
+ (intermediate): BertIntermediate(
293
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
294
+ (intermediate_act_fn): GELUActivation()
295
+ )
296
+ (output): BertOutput(
297
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
298
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
299
+ (dropout): Dropout(p=0.1, inplace=False)
300
+ )
301
+ )
302
+ )
303
+ )
304
+ (pooler): BertPooler(
305
+ (dense): Linear(in_features=768, out_features=768, bias=True)
306
+ (activation): Tanh()
307
+ )
308
+ )
309
+ )
310
+ (word_dropout): WordDropout(p=0.05)
311
+ (locked_dropout): LockedDropout(p=0.5)
312
+ (linear): Linear(in_features=768, out_features=18, bias=True)
313
+ (beta): 1.0
314
+ (weights): None
315
+ (weight_tensor) None
316
+ )"
317
+ 2023-01-08 08:23:21,500 ----------------------------------------------------------------------------------------------------
318
+ 2023-01-08 08:23:21,505 Corpus: "Corpus: 26116 train + 2902 dev + 1572 test sentences"
319
+ 2023-01-08 08:23:21,506 ----------------------------------------------------------------------------------------------------
320
+ 2023-01-08 08:23:21,506 Parameters:
321
+ 2023-01-08 08:23:21,507 - learning_rate: "5e-06"
322
+ 2023-01-08 08:23:21,509 - mini_batch_size: "4"
323
+ 2023-01-08 08:23:21,510 - patience: "3"
324
+ 2023-01-08 08:23:21,512 - anneal_factor: "0.5"
325
+ 2023-01-08 08:23:21,513 - max_epochs: "25"
326
+ 2023-01-08 08:23:21,513 - shuffle: "False"
327
+ 2023-01-08 08:23:21,514 - train_with_dev: "False"
328
+ 2023-01-08 08:23:21,515 - batch_growth_annealing: "False"
329
+ 2023-01-08 08:23:21,516 ----------------------------------------------------------------------------------------------------
330
+ 2023-01-08 08:23:21,517 Model training base path: "resources/taggers/nsurl_512_000_noshuffle_15epoch"
331
+ 2023-01-08 08:23:21,518 ----------------------------------------------------------------------------------------------------
332
+ 2023-01-08 08:23:21,519 Device: cuda:0
333
+ 2023-01-08 08:23:21,519 ----------------------------------------------------------------------------------------------------
334
+ 2023-01-08 08:23:21,520 Embeddings storage mode: none
335
+ 2023-01-08 08:23:21,523 ----------------------------------------------------------------------------------------------------
336
+ 2023-01-08 08:25:35,073 epoch 1 - iter 652/6529 - loss 2.76353314 - samples/sec: 19.54 - lr: 0.000000
337
+ 2023-01-08 08:27:43,826 epoch 1 - iter 1304/6529 - loss 2.10468350 - samples/sec: 20.26 - lr: 0.000000
338
+ 2023-01-08 08:29:54,350 epoch 1 - iter 1956/6529 - loss 1.67857681 - samples/sec: 19.99 - lr: 0.000001
339
+ 2023-01-08 08:32:06,509 epoch 1 - iter 2608/6529 - loss 1.40096638 - samples/sec: 19.74 - lr: 0.000001
340
+ 2023-01-08 08:34:21,272 epoch 1 - iter 3260/6529 - loss 1.20354192 - samples/sec: 19.36 - lr: 0.000001
341
+ 2023-01-08 08:36:35,474 epoch 1 - iter 3912/6529 - loss 1.06876365 - samples/sec: 19.44 - lr: 0.000001
342
+ 2023-01-08 08:38:49,910 epoch 1 - iter 4564/6529 - loss 0.96565944 - samples/sec: 19.41 - lr: 0.000001
343
+ 2023-01-08 08:41:02,964 epoch 1 - iter 5216/6529 - loss 0.89404243 - samples/sec: 19.61 - lr: 0.000002
344
+ 2023-01-08 08:43:17,807 epoch 1 - iter 5868/6529 - loss 0.82707116 - samples/sec: 19.35 - lr: 0.000002
345
+ 2023-01-08 08:45:30,584 epoch 1 - iter 6520/6529 - loss 0.77020616 - samples/sec: 19.65 - lr: 0.000002
346
+ 2023-01-08 08:45:32,301 ----------------------------------------------------------------------------------------------------
347
+ 2023-01-08 08:45:32,303 EPOCH 1 done: loss 0.7697 - lr 0.0000020
348
+ 2023-01-08 08:46:43,903 DEV : loss 0.14591681957244873 - f1-score (micro avg) 0.6513
349
+ 2023-01-08 08:46:43,968 BAD EPOCHS (no improvement): 4
350
+ 2023-01-08 08:46:43,969 ----------------------------------------------------------------------------------------------------
351
+ 2023-01-08 08:49:00,303 epoch 2 - iter 652/6529 - loss 0.26767246 - samples/sec: 19.14 - lr: 0.000002
352
+ 2023-01-08 08:51:14,930 epoch 2 - iter 1304/6529 - loss 0.26032050 - samples/sec: 19.38 - lr: 0.000002
353
+ 2023-01-08 08:53:24,163 epoch 2 - iter 1956/6529 - loss 0.25277047 - samples/sec: 20.19 - lr: 0.000003
354
+ 2023-01-08 08:55:38,396 epoch 2 - iter 2608/6529 - loss 0.25357495 - samples/sec: 19.44 - lr: 0.000003
355
+ 2023-01-08 08:57:51,341 epoch 2 - iter 3260/6529 - loss 0.25742882 - samples/sec: 19.63 - lr: 0.000003
356
+ 2023-01-08 09:00:05,181 epoch 2 - iter 3912/6529 - loss 0.25759538 - samples/sec: 19.49 - lr: 0.000003
357
+ 2023-01-08 09:02:19,914 epoch 2 - iter 4564/6529 - loss 0.25655432 - samples/sec: 19.37 - lr: 0.000003
358
+ 2023-01-08 09:04:32,472 epoch 2 - iter 5216/6529 - loss 0.25512109 - samples/sec: 19.68 - lr: 0.000004
359
+ 2023-01-08 09:06:45,546 epoch 2 - iter 5868/6529 - loss 0.25205262 - samples/sec: 19.61 - lr: 0.000004
360
+ 2023-01-08 09:08:59,435 epoch 2 - iter 6520/6529 - loss 0.24897569 - samples/sec: 19.49 - lr: 0.000004
361
+ 2023-01-08 09:09:01,160 ----------------------------------------------------------------------------------------------------
362
+ 2023-01-08 09:09:01,163 EPOCH 2 done: loss 0.2490 - lr 0.0000040
363
+ 2023-01-08 09:10:14,132 DEV : loss 0.09620723873376846 - f1-score (micro avg) 0.8018
364
+ 2023-01-08 09:10:14,175 BAD EPOCHS (no improvement): 4
365
+ 2023-01-08 09:10:14,176 ----------------------------------------------------------------------------------------------------
366
+ 2023-01-08 09:12:30,516 epoch 3 - iter 652/6529 - loss 0.21670855 - samples/sec: 19.14 - lr: 0.000004
367
+ 2023-01-08 09:14:44,527 epoch 3 - iter 1304/6529 - loss 0.21922916 - samples/sec: 19.47 - lr: 0.000004
368
+ 2023-01-08 09:16:56,659 epoch 3 - iter 1956/6529 - loss 0.21763112 - samples/sec: 19.75 - lr: 0.000005
369
+ 2023-01-08 09:19:11,830 epoch 3 - iter 2608/6529 - loss 0.22018173 - samples/sec: 19.30 - lr: 0.000005
370
+ 2023-01-08 09:21:25,549 epoch 3 - iter 3260/6529 - loss 0.22296623 - samples/sec: 19.51 - lr: 0.000005
371
+ 2023-01-08 09:23:49,595 epoch 3 - iter 3912/6529 - loss 0.22464455 - samples/sec: 18.11 - lr: 0.000005
372
+ 2023-01-08 09:26:08,249 epoch 3 - iter 4564/6529 - loss 0.22413709 - samples/sec: 18.82 - lr: 0.000005
373
+ 2023-01-08 09:28:21,561 epoch 3 - iter 5216/6529 - loss 0.22271140 - samples/sec: 19.57 - lr: 0.000005
374
+ 2023-01-08 09:30:39,450 epoch 3 - iter 5868/6529 - loss 0.22078188 - samples/sec: 18.92 - lr: 0.000005
375
+ 2023-01-08 09:32:56,931 epoch 3 - iter 6520/6529 - loss 0.21923857 - samples/sec: 18.98 - lr: 0.000005
376
+ 2023-01-08 09:32:58,863 ----------------------------------------------------------------------------------------------------
377
+ 2023-01-08 09:32:58,865 EPOCH 3 done: loss 0.2193 - lr 0.0000049
378
+ 2023-01-08 09:34:11,869 DEV : loss 0.08930665999650955 - f1-score (micro avg) 0.8357
379
+ 2023-01-08 09:34:11,911 BAD EPOCHS (no improvement): 4
380
+ 2023-01-08 09:34:11,912 ----------------------------------------------------------------------------------------------------
381
+ 2023-01-08 09:36:27,376 epoch 4 - iter 652/6529 - loss 0.19632441 - samples/sec: 19.26 - lr: 0.000005
382
+ 2023-01-08 09:38:41,808 epoch 4 - iter 1304/6529 - loss 0.19654954 - samples/sec: 19.41 - lr: 0.000005
383
+ 2023-01-08 09:40:53,083 epoch 4 - iter 1956/6529 - loss 0.19641485 - samples/sec: 19.88 - lr: 0.000005
384
+ 2023-01-08 09:43:06,935 epoch 4 - iter 2608/6529 - loss 0.19908824 - samples/sec: 19.49 - lr: 0.000005
385
+ 2023-01-08 09:45:22,775 epoch 4 - iter 3260/6529 - loss 0.20233334 - samples/sec: 19.21 - lr: 0.000005
386
+ 2023-01-08 09:47:38,337 epoch 4 - iter 3912/6529 - loss 0.20352574 - samples/sec: 19.25 - lr: 0.000005
387
+ 2023-01-08 09:49:52,733 epoch 4 - iter 4564/6529 - loss 0.20279599 - samples/sec: 19.41 - lr: 0.000005
388
+ 2023-01-08 09:52:08,210 epoch 4 - iter 5216/6529 - loss 0.20192930 - samples/sec: 19.26 - lr: 0.000005
389
+ 2023-01-08 09:54:24,632 epoch 4 - iter 5868/6529 - loss 0.20036623 - samples/sec: 19.13 - lr: 0.000005
390
+ 2023-01-08 09:56:39,471 epoch 4 - iter 6520/6529 - loss 0.19916323 - samples/sec: 19.35 - lr: 0.000005
391
+ 2023-01-08 09:56:41,133 ----------------------------------------------------------------------------------------------------
392
+ 2023-01-08 09:56:41,136 EPOCH 4 done: loss 0.1992 - lr 0.0000047
393
+ 2023-01-08 09:57:57,145 DEV : loss 0.0921374261379242 - f1-score (micro avg) 0.8614
394
+ 2023-01-08 09:57:57,193 BAD EPOCHS (no improvement): 4
395
+ 2023-01-08 09:57:57,195 ----------------------------------------------------------------------------------------------------
396
+ 2023-01-08 10:00:12,637 epoch 5 - iter 652/6529 - loss 0.17778101 - samples/sec: 19.26 - lr: 0.000005
397
+ 2023-01-08 10:02:28,175 epoch 5 - iter 1304/6529 - loss 0.18126676 - samples/sec: 19.25 - lr: 0.000005
398
+ 2023-01-08 10:04:43,855 epoch 5 - iter 1956/6529 - loss 0.18348900 - samples/sec: 19.23 - lr: 0.000005
399
+ 2023-01-08 10:06:58,958 epoch 5 - iter 2608/6529 - loss 0.18486018 - samples/sec: 19.31 - lr: 0.000005
400
+ 2023-01-08 10:09:15,265 epoch 5 - iter 3260/6529 - loss 0.18834373 - samples/sec: 19.14 - lr: 0.000005
401
+ 2023-01-08 10:11:31,435 epoch 5 - iter 3912/6529 - loss 0.18964518 - samples/sec: 19.16 - lr: 0.000005
402
+ 2023-01-08 10:13:46,637 epoch 5 - iter 4564/6529 - loss 0.18928872 - samples/sec: 19.30 - lr: 0.000005
403
+ 2023-01-08 10:16:01,327 epoch 5 - iter 5216/6529 - loss 0.18879883 - samples/sec: 19.37 - lr: 0.000004
404
+ 2023-01-08 10:18:16,826 epoch 5 - iter 5868/6529 - loss 0.18708286 - samples/sec: 19.26 - lr: 0.000004
405
+ 2023-01-08 10:20:34,164 epoch 5 - iter 6520/6529 - loss 0.18582796 - samples/sec: 19.00 - lr: 0.000004
406
+ 2023-01-08 10:20:36,025 ----------------------------------------------------------------------------------------------------
407
+ 2023-01-08 10:20:36,028 EPOCH 5 done: loss 0.1859 - lr 0.0000044
408
+ 2023-01-08 10:21:52,956 DEV : loss 0.09580960869789124 - f1-score (micro avg) 0.8699
409
+ 2023-01-08 10:21:53,002 BAD EPOCHS (no improvement): 4
410
+ 2023-01-08 10:21:53,004 ----------------------------------------------------------------------------------------------------
411
+ 2023-01-08 10:24:09,733 epoch 6 - iter 652/6529 - loss 0.17521655 - samples/sec: 19.08 - lr: 0.000004
412
+ 2023-01-08 10:26:21,528 epoch 6 - iter 1304/6529 - loss 0.17424610 - samples/sec: 19.80 - lr: 0.000004
413
+ 2023-01-08 10:28:34,945 epoch 6 - iter 1956/6529 - loss 0.17396923 - samples/sec: 19.56 - lr: 0.000004
414
+ 2023-01-08 10:30:49,825 epoch 6 - iter 2608/6529 - loss 0.17528702 - samples/sec: 19.34 - lr: 0.000004
415
+ 2023-01-08 10:33:06,765 epoch 6 - iter 3260/6529 - loss 0.17777277 - samples/sec: 19.05 - lr: 0.000004
416
+ 2023-01-08 10:35:23,591 epoch 6 - iter 3912/6529 - loss 0.17930874 - samples/sec: 19.07 - lr: 0.000004
417
+ 2023-01-08 10:37:41,441 epoch 6 - iter 4564/6529 - loss 0.17885569 - samples/sec: 18.93 - lr: 0.000004
418
+ 2023-01-08 10:39:56,482 epoch 6 - iter 5216/6529 - loss 0.17822966 - samples/sec: 19.32 - lr: 0.000004
419
+ 2023-01-08 10:42:11,452 epoch 6 - iter 5868/6529 - loss 0.17761229 - samples/sec: 19.33 - lr: 0.000004
420
+ 2023-01-08 10:44:29,082 epoch 6 - iter 6520/6529 - loss 0.17612404 - samples/sec: 18.96 - lr: 0.000004
421
+ 2023-01-08 10:44:31,061 ----------------------------------------------------------------------------------------------------
422
+ 2023-01-08 10:44:31,063 EPOCH 6 done: loss 0.1762 - lr 0.0000042
423
+ 2023-01-08 10:45:47,791 DEV : loss 0.10489046573638916 - f1-score (micro avg) 0.8826
424
+ 2023-01-08 10:45:47,842 BAD EPOCHS (no improvement): 4
425
+ 2023-01-08 10:45:47,844 ----------------------------------------------------------------------------------------------------
426
+ 2023-01-08 10:48:01,767 epoch 7 - iter 652/6529 - loss 0.16234851 - samples/sec: 19.48 - lr: 0.000004
427
+ 2023-01-08 10:50:17,317 epoch 7 - iter 1304/6529 - loss 0.16401460 - samples/sec: 19.25 - lr: 0.000004
428
+ 2023-01-08 10:52:30,318 epoch 7 - iter 1956/6529 - loss 0.16447709 - samples/sec: 19.62 - lr: 0.000004
429
+ 2023-01-08 10:54:46,762 epoch 7 - iter 2608/6529 - loss 0.16559102 - samples/sec: 19.12 - lr: 0.000004
430
+ 2023-01-08 10:57:04,951 epoch 7 - iter 3260/6529 - loss 0.16837598 - samples/sec: 18.88 - lr: 0.000004
431
+ 2023-01-08 10:59:20,546 epoch 7 - iter 3912/6529 - loss 0.17080542 - samples/sec: 19.24 - lr: 0.000004
432
+ 2023-01-08 11:01:39,182 epoch 7 - iter 4564/6529 - loss 0.17015294 - samples/sec: 18.82 - lr: 0.000004
433
+ 2023-01-08 11:03:54,395 epoch 7 - iter 5216/6529 - loss 0.16971769 - samples/sec: 19.30 - lr: 0.000004
434
+ 2023-01-08 11:06:10,242 epoch 7 - iter 5868/6529 - loss 0.16883507 - samples/sec: 19.21 - lr: 0.000004
435
+ 2023-01-08 11:08:28,889 epoch 7 - iter 6520/6529 - loss 0.16791804 - samples/sec: 18.82 - lr: 0.000004
436
+ 2023-01-08 11:08:30,777 ----------------------------------------------------------------------------------------------------
437
+ 2023-01-08 11:08:30,780 EPOCH 7 done: loss 0.1679 - lr 0.0000040
438
+ 2023-01-08 11:09:46,112 DEV : loss 0.10590970516204834 - f1-score (micro avg) 0.892
439
+ 2023-01-08 11:09:46,164 BAD EPOCHS (no improvement): 4
440
+ 2023-01-08 11:09:46,166 ----------------------------------------------------------------------------------------------------
441
+ 2023-01-08 11:12:00,934 epoch 8 - iter 652/6529 - loss 0.15312178 - samples/sec: 19.36 - lr: 0.000004
442
+ 2023-01-08 11:14:14,666 epoch 8 - iter 1304/6529 - loss 0.15723507 - samples/sec: 19.51 - lr: 0.000004
443
+ 2023-01-08 11:16:31,894 epoch 8 - iter 1956/6529 - loss 0.15652843 - samples/sec: 19.01 - lr: 0.000004
444
+ 2023-01-08 11:18:46,987 epoch 8 - iter 2608/6529 - loss 0.15826168 - samples/sec: 19.31 - lr: 0.000004
445
+ 2023-01-08 11:21:00,593 epoch 8 - iter 3260/6529 - loss 0.16011241 - samples/sec: 19.53 - lr: 0.000004
446
+ 2023-01-08 11:23:17,924 epoch 8 - iter 3912/6529 - loss 0.16210808 - samples/sec: 19.00 - lr: 0.000004
447
+ 2023-01-08 11:25:34,299 epoch 8 - iter 4564/6529 - loss 0.16260045 - samples/sec: 19.13 - lr: 0.000004
448
+ 2023-01-08 11:27:46,535 epoch 8 - iter 5216/6529 - loss 0.16219764 - samples/sec: 19.73 - lr: 0.000004
449
+ 2023-01-08 11:30:03,401 epoch 8 - iter 5868/6529 - loss 0.16156175 - samples/sec: 19.06 - lr: 0.000004
450
+ 2023-01-08 11:32:17,565 epoch 8 - iter 6520/6529 - loss 0.16053879 - samples/sec: 19.45 - lr: 0.000004
451
+ 2023-01-08 11:32:19,391 ----------------------------------------------------------------------------------------------------
452
+ 2023-01-08 11:32:19,393 EPOCH 8 done: loss 0.1606 - lr 0.0000038
453
+ 2023-01-08 11:33:33,542 DEV : loss 0.10866044461727142 - f1-score (micro avg) 0.889
454
+ 2023-01-08 11:33:33,590 BAD EPOCHS (no improvement): 4
455
+ 2023-01-08 11:33:33,592 ----------------------------------------------------------------------------------------------------
456
+ 2023-01-08 11:35:46,606 epoch 9 - iter 652/6529 - loss 0.15568375 - samples/sec: 19.62 - lr: 0.000004
457
+ 2023-01-08 11:38:00,623 epoch 9 - iter 1304/6529 - loss 0.15500402 - samples/sec: 19.47 - lr: 0.000004
458
+ 2023-01-08 11:40:11,536 epoch 9 - iter 1956/6529 - loss 0.15346711 - samples/sec: 19.93 - lr: 0.000004
459
+ 2023-01-08 11:42:31,019 epoch 9 - iter 2608/6529 - loss 0.15530038 - samples/sec: 18.71 - lr: 0.000004
460
+ 2023-01-08 11:44:46,689 epoch 9 - iter 3260/6529 - loss 0.15662159 - samples/sec: 19.23 - lr: 0.000004
461
+ 2023-01-08 11:47:04,958 epoch 9 - iter 3912/6529 - loss 0.15851655 - samples/sec: 18.87 - lr: 0.000004
462
+ 2023-01-08 11:49:25,939 epoch 9 - iter 4564/6529 - loss 0.15831685 - samples/sec: 18.51 - lr: 0.000004
463
+ 2023-01-08 11:51:41,077 epoch 9 - iter 5216/6529 - loss 0.15778522 - samples/sec: 19.31 - lr: 0.000004
464
+ 2023-01-08 11:54:00,178 epoch 9 - iter 5868/6529 - loss 0.15675165 - samples/sec: 18.76 - lr: 0.000004
465
+ 2023-01-08 11:56:18,653 epoch 9 - iter 6520/6529 - loss 0.15587139 - samples/sec: 18.84 - lr: 0.000004
466
+ 2023-01-08 11:56:20,505 ----------------------------------------------------------------------------------------------------
467
+ 2023-01-08 11:56:20,506 EPOCH 9 done: loss 0.1559 - lr 0.0000036
468
+ 2023-01-08 11:57:35,001 DEV : loss 0.11621606349945068 - f1-score (micro avg) 0.8955
469
+ 2023-01-08 11:57:35,052 BAD EPOCHS (no improvement): 4
470
+ 2023-01-08 11:57:35,054 ----------------------------------------------------------------------------------------------------
471
+ 2023-01-08 11:59:50,825 epoch 10 - iter 652/6529 - loss 0.14409633 - samples/sec: 19.22 - lr: 0.000004
472
+ 2023-01-08 12:02:06,533 epoch 10 - iter 1304/6529 - loss 0.14631135 - samples/sec: 19.23 - lr: 0.000004
473
+ 2023-01-08 12:04:21,322 epoch 10 - iter 1956/6529 - loss 0.14735676 - samples/sec: 19.36 - lr: 0.000003
474
+ 2023-01-08 12:06:35,422 epoch 10 - iter 2608/6529 - loss 0.14904395 - samples/sec: 19.46 - lr: 0.000003
475
+ 2023-01-08 12:08:53,778 epoch 10 - iter 3260/6529 - loss 0.15018463 - samples/sec: 18.86 - lr: 0.000003
476
+ 2023-01-08 12:11:11,643 epoch 10 - iter 3912/6529 - loss 0.15132750 - samples/sec: 18.93 - lr: 0.000003
477
+ 2023-01-08 12:13:29,660 epoch 10 - iter 4564/6529 - loss 0.15188127 - samples/sec: 18.90 - lr: 0.000003
478
+ 2023-01-08 12:15:44,264 epoch 10 - iter 5216/6529 - loss 0.15133341 - samples/sec: 19.38 - lr: 0.000003
479
+ 2023-01-08 12:18:01,119 epoch 10 - iter 5868/6529 - loss 0.15156043 - samples/sec: 19.06 - lr: 0.000003
480
+ 2023-01-08 12:20:16,350 epoch 10 - iter 6520/6529 - loss 0.15045767 - samples/sec: 19.29 - lr: 0.000003
481
+ 2023-01-08 12:20:18,235 ----------------------------------------------------------------------------------------------------
482
+ 2023-01-08 12:20:18,237 EPOCH 10 done: loss 0.1505 - lr 0.0000033
483
+ 2023-01-08 12:21:35,768 DEV : loss 0.11673574149608612 - f1-score (micro avg) 0.8996
484
+ 2023-01-08 12:21:35,818 BAD EPOCHS (no improvement): 4
485
+ 2023-01-08 12:21:35,820 ----------------------------------------------------------------------------------------------------
486
+ 2023-01-08 12:23:55,505 epoch 11 - iter 652/6529 - loss 0.14428276 - samples/sec: 18.68 - lr: 0.000003
487
+ 2023-01-08 12:26:10,978 epoch 11 - iter 1304/6529 - loss 0.14390834 - samples/sec: 19.26 - lr: 0.000003
488
+ 2023-01-08 12:28:26,666 epoch 11 - iter 1956/6529 - loss 0.14472155 - samples/sec: 19.23 - lr: 0.000003
489
+ 2023-01-08 12:30:41,813 epoch 11 - iter 2608/6529 - loss 0.14514745 - samples/sec: 19.31 - lr: 0.000003
490
+ 2023-01-08 12:32:57,301 epoch 11 - iter 3260/6529 - loss 0.14604008 - samples/sec: 19.26 - lr: 0.000003
491
+ 2023-01-08 12:35:12,776 epoch 11 - iter 3912/6529 - loss 0.14811782 - samples/sec: 19.26 - lr: 0.000003
492
+ 2023-01-08 12:37:29,540 epoch 11 - iter 4564/6529 - loss 0.14833497 - samples/sec: 19.08 - lr: 0.000003
493
+ 2023-01-08 12:39:44,148 epoch 11 - iter 5216/6529 - loss 0.14770587 - samples/sec: 19.38 - lr: 0.000003
494
+ 2023-01-08 12:41:57,137 epoch 11 - iter 5868/6529 - loss 0.14687045 - samples/sec: 19.62 - lr: 0.000003
495
+ 2023-01-08 12:44:12,270 epoch 11 - iter 6520/6529 - loss 0.14637472 - samples/sec: 19.31 - lr: 0.000003
496
+ 2023-01-08 12:44:14,156 ----------------------------------------------------------------------------------------------------
497
+ 2023-01-08 12:44:14,161 EPOCH 11 done: loss 0.1464 - lr 0.0000031
498
+ 2023-01-08 12:45:32,296 DEV : loss 0.13083890080451965 - f1-score (micro avg) 0.9023
499
+ 2023-01-08 12:45:32,345 BAD EPOCHS (no improvement): 4
500
+ 2023-01-08 12:45:32,346 ----------------------------------------------------------------------------------------------------
501
+ 2023-01-08 12:47:48,927 epoch 12 - iter 652/6529 - loss 0.13782378 - samples/sec: 19.10 - lr: 0.000003
502
+ 2023-01-08 12:50:06,437 epoch 12 - iter 1304/6529 - loss 0.13900710 - samples/sec: 18.97 - lr: 0.000003
503
+ 2023-01-08 12:52:19,502 epoch 12 - iter 1956/6529 - loss 0.13938580 - samples/sec: 19.61 - lr: 0.000003
504
+ 2023-01-08 12:54:34,968 epoch 12 - iter 2608/6529 - loss 0.14020151 - samples/sec: 19.26 - lr: 0.000003
505
+ 2023-01-08 12:56:52,757 epoch 12 - iter 3260/6529 - loss 0.14277796 - samples/sec: 18.94 - lr: 0.000003
506
+ 2023-01-08 12:59:08,728 epoch 12 - iter 3912/6529 - loss 0.14513185 - samples/sec: 19.19 - lr: 0.000003
507
+ 2023-01-08 13:01:25,805 epoch 12 - iter 4564/6529 - loss 0.14531044 - samples/sec: 19.03 - lr: 0.000003
508
+ 2023-01-08 13:03:41,339 epoch 12 - iter 5216/6529 - loss 0.14480840 - samples/sec: 19.25 - lr: 0.000003
509
+ 2023-01-08 13:05:56,188 epoch 12 - iter 5868/6529 - loss 0.14468314 - samples/sec: 19.35 - lr: 0.000003
510
+ 2023-01-08 13:08:13,187 epoch 12 - iter 6520/6529 - loss 0.14374744 - samples/sec: 19.05 - lr: 0.000003
511
+ 2023-01-08 13:08:15,004 ----------------------------------------------------------------------------------------------------
512
+ 2023-01-08 13:08:15,008 EPOCH 12 done: loss 0.1438 - lr 0.0000029
513
+ 2023-01-08 13:09:29,582 DEV : loss 0.13419032096862793 - f1-score (micro avg) 0.9025
514
+ 2023-01-08 13:09:29,636 BAD EPOCHS (no improvement): 4
515
+ 2023-01-08 13:09:29,638 ----------------------------------------------------------------------------------------------------
516
+ 2023-01-08 13:11:47,888 epoch 13 - iter 652/6529 - loss 0.13602517 - samples/sec: 18.87 - lr: 0.000003
517
+ 2023-01-08 13:14:03,642 epoch 13 - iter 1304/6529 - loss 0.13732952 - samples/sec: 19.22 - lr: 0.000003
518
+ 2023-01-08 13:16:16,517 epoch 13 - iter 1956/6529 - loss 0.13703433 - samples/sec: 19.64 - lr: 0.000003
519
+ 2023-01-08 13:18:32,679 epoch 13 - iter 2608/6529 - loss 0.13862001 - samples/sec: 19.16 - lr: 0.000003
520
+ 2023-01-08 13:20:50,849 epoch 13 - iter 3260/6529 - loss 0.14052905 - samples/sec: 18.88 - lr: 0.000003
521
+ 2023-01-08 13:23:08,216 epoch 13 - iter 3912/6529 - loss 0.14156690 - samples/sec: 18.99 - lr: 0.000003
522
+ 2023-01-08 13:25:24,958 epoch 13 - iter 4564/6529 - loss 0.14102465 - samples/sec: 19.08 - lr: 0.000003
523
+ 2023-01-08 13:27:40,418 epoch 13 - iter 5216/6529 - loss 0.14044374 - samples/sec: 19.26 - lr: 0.000003
524
+ 2023-01-08 13:29:57,587 epoch 13 - iter 5868/6529 - loss 0.14039873 - samples/sec: 19.02 - lr: 0.000003
525
+ 2023-01-08 13:32:16,056 epoch 13 - iter 6520/6529 - loss 0.13956668 - samples/sec: 18.84 - lr: 0.000003
526
+ 2023-01-08 13:32:17,665 ----------------------------------------------------------------------------------------------------
527
+ 2023-01-08 13:32:17,668 EPOCH 13 done: loss 0.1396 - lr 0.0000027
528
+ 2023-01-08 13:33:31,771 DEV : loss 0.13482151925563812 - f1-score (micro avg) 0.9055
529
+ 2023-01-08 13:33:31,821 BAD EPOCHS (no improvement): 4
530
+ 2023-01-08 13:33:31,823 ----------------------------------------------------------------------------------------------------
531
+ 2023-01-08 13:35:50,820 epoch 14 - iter 652/6529 - loss 0.13136383 - samples/sec: 18.77 - lr: 0.000003
532
+ 2023-01-08 13:38:04,351 epoch 14 - iter 1304/6529 - loss 0.13382280 - samples/sec: 19.54 - lr: 0.000003
533
+ 2023-01-08 13:40:18,657 epoch 14 - iter 1956/6529 - loss 0.13488302 - samples/sec: 19.43 - lr: 0.000003
534
+ 2023-01-08 13:42:36,373 epoch 14 - iter 2608/6529 - loss 0.13564871 - samples/sec: 18.95 - lr: 0.000003
535
+ 2023-01-08 13:44:53,302 epoch 14 - iter 3260/6529 - loss 0.13706665 - samples/sec: 19.05 - lr: 0.000003
536
+ 2023-01-08 13:47:09,554 epoch 14 - iter 3912/6529 - loss 0.13866847 - samples/sec: 19.15 - lr: 0.000003
537
+ 2023-01-08 13:49:25,356 epoch 14 - iter 4564/6529 - loss 0.13860764 - samples/sec: 19.21 - lr: 0.000003
538
+ 2023-01-08 13:51:40,558 epoch 14 - iter 5216/6529 - loss 0.13787870 - samples/sec: 19.30 - lr: 0.000002
539
+ 2023-01-08 13:53:57,761 epoch 14 - iter 5868/6529 - loss 0.13779242 - samples/sec: 19.02 - lr: 0.000002
540
+ 2023-01-08 13:56:14,197 epoch 14 - iter 6520/6529 - loss 0.13672301 - samples/sec: 19.12 - lr: 0.000002
541
+ 2023-01-08 13:56:15,980 ----------------------------------------------------------------------------------------------------
542
+ 2023-01-08 13:56:15,983 EPOCH 14 done: loss 0.1367 - lr 0.0000024
543
+ 2023-01-08 13:57:30,521 DEV : loss 0.13973088562488556 - f1-score (micro avg) 0.9037
544
+ 2023-01-08 13:57:30,570 BAD EPOCHS (no improvement): 4
545
+ 2023-01-08 13:57:30,572 ----------------------------------------------------------------------------------------------------
546
+ 2023-01-08 13:59:46,249 epoch 15 - iter 652/6529 - loss 0.13280969 - samples/sec: 19.23 - lr: 0.000002
547
+ 2023-01-08 14:01:59,714 epoch 15 - iter 1304/6529 - loss 0.13297304 - samples/sec: 19.55 - lr: 0.000002
548
+ 2023-01-08 14:04:13,515 epoch 15 - iter 1956/6529 - loss 0.13331323 - samples/sec: 19.50 - lr: 0.000002
549
+ 2023-01-08 14:06:30,305 epoch 15 - iter 2608/6529 - loss 0.13321780 - samples/sec: 19.07 - lr: 0.000002
550
+ 2023-01-08 14:08:46,289 epoch 15 - iter 3260/6529 - loss 0.13439079 - samples/sec: 19.19 - lr: 0.000002
551
+ 2023-01-08 14:11:04,469 epoch 15 - iter 3912/6529 - loss 0.13600843 - samples/sec: 18.88 - lr: 0.000002
552
+ 2023-01-08 14:13:21,293 epoch 15 - iter 4564/6529 - loss 0.13579252 - samples/sec: 19.07 - lr: 0.000002
553
+ 2023-01-08 14:15:33,406 epoch 15 - iter 5216/6529 - loss 0.13553200 - samples/sec: 19.75 - lr: 0.000002
554
+ 2023-01-08 14:17:49,770 epoch 15 - iter 5868/6529 - loss 0.13548036 - samples/sec: 19.13 - lr: 0.000002
555
+ 2023-01-08 14:20:05,553 epoch 15 - iter 6520/6529 - loss 0.13484085 - samples/sec: 19.22 - lr: 0.000002
556
+ 2023-01-08 14:20:07,463 ----------------------------------------------------------------------------------------------------
557
+ 2023-01-08 14:20:07,466 EPOCH 15 done: loss 0.1349 - lr 0.0000022
558
+ 2023-01-08 14:21:24,464 DEV : loss 0.14579473435878754 - f1-score (micro avg) 0.9059
559
+ 2023-01-08 14:21:24,516 BAD EPOCHS (no improvement): 4
560
+ 2023-01-08 14:21:24,518 ----------------------------------------------------------------------------------------------------
561
+ 2023-01-08 14:23:39,037 epoch 16 - iter 652/6529 - loss 0.13068872 - samples/sec: 19.40 - lr: 0.000002
562
+ 2023-01-08 14:25:53,669 epoch 16 - iter 1304/6529 - loss 0.13040826 - samples/sec: 19.38 - lr: 0.000002
563
+ 2023-01-08 14:28:08,991 epoch 16 - iter 1956/6529 - loss 0.13074354 - samples/sec: 19.28 - lr: 0.000002
564
+ 2023-01-08 14:30:23,871 epoch 16 - iter 2608/6529 - loss 0.13159639 - samples/sec: 19.34 - lr: 0.000002
565
+ 2023-01-08 14:32:41,690 epoch 16 - iter 3260/6529 - loss 0.13299574 - samples/sec: 18.93 - lr: 0.000002
566
+ 2023-01-08 14:34:59,097 epoch 16 - iter 3912/6529 - loss 0.13394349 - samples/sec: 18.99 - lr: 0.000002
567
+ 2023-01-08 14:37:15,659 epoch 16 - iter 4564/6529 - loss 0.13395312 - samples/sec: 19.11 - lr: 0.000002
568
+ 2023-01-08 14:39:32,146 epoch 16 - iter 5216/6529 - loss 0.13371950 - samples/sec: 19.12 - lr: 0.000002
569
+ 2023-01-08 14:41:51,422 epoch 16 - iter 5868/6529 - loss 0.13359614 - samples/sec: 18.73 - lr: 0.000002
570
+ 2023-01-08 14:44:09,052 epoch 16 - iter 6520/6529 - loss 0.13321435 - samples/sec: 18.96 - lr: 0.000002
571
+ 2023-01-08 14:44:10,865 ----------------------------------------------------------------------------------------------------
572
+ 2023-01-08 14:44:10,869 EPOCH 16 done: loss 0.1332 - lr 0.0000020
573
+ 2023-01-08 14:45:30,000 DEV : loss 0.14927005767822266 - f1-score (micro avg) 0.9049
574
+ 2023-01-08 14:45:30,051 BAD EPOCHS (no improvement): 4
575
+ 2023-01-08 14:45:30,053 ----------------------------------------------------------------------------------------------------
576
+ 2023-01-08 14:47:46,130 epoch 17 - iter 652/6529 - loss 0.12683164 - samples/sec: 19.17 - lr: 0.000002
577
+ 2023-01-08 14:50:02,615 epoch 17 - iter 1304/6529 - loss 0.12923492 - samples/sec: 19.12 - lr: 0.000002
578
+ 2023-01-08 14:52:17,903 epoch 17 - iter 1956/6529 - loss 0.12797654 - samples/sec: 19.29 - lr: 0.000002
579
+ 2023-01-08 14:54:35,065 epoch 17 - iter 2608/6529 - loss 0.12929489 - samples/sec: 19.02 - lr: 0.000002
580
+ 2023-01-08 14:56:56,125 epoch 17 - iter 3260/6529 - loss 0.12964457 - samples/sec: 18.50 - lr: 0.000002
581
+ 2023-01-08 14:59:15,274 epoch 17 - iter 3912/6529 - loss 0.13117108 - samples/sec: 18.75 - lr: 0.000002
582
+ 2023-01-08 15:01:32,817 epoch 17 - iter 4564/6529 - loss 0.13181821 - samples/sec: 18.97 - lr: 0.000002
583
+ 2023-01-08 15:03:48,960 epoch 17 - iter 5216/6529 - loss 0.13160885 - samples/sec: 19.16 - lr: 0.000002
584
+ 2023-01-08 15:06:06,361 epoch 17 - iter 5868/6529 - loss 0.13181237 - samples/sec: 18.99 - lr: 0.000002
585
+ 2023-01-08 15:08:27,233 epoch 17 - iter 6520/6529 - loss 0.13154532 - samples/sec: 18.52 - lr: 0.000002
586
+ 2023-01-08 15:08:29,140 ----------------------------------------------------------------------------------------------------
587
+ 2023-01-08 15:08:29,143 EPOCH 17 done: loss 0.1315 - lr 0.0000018
588
+ 2023-01-08 15:09:47,453 DEV : loss 0.15806013345718384 - f1-score (micro avg) 0.9069
589
+ 2023-01-08 15:09:47,506 BAD EPOCHS (no improvement): 4
590
+ 2023-01-08 15:09:47,508 ----------------------------------------------------------------------------------------------------
591
+ 2023-01-08 15:12:04,468 epoch 18 - iter 652/6529 - loss 0.12643756 - samples/sec: 19.05 - lr: 0.000002
592
+ 2023-01-08 15:14:20,015 epoch 18 - iter 1304/6529 - loss 0.12885445 - samples/sec: 19.25 - lr: 0.000002
593
+ 2023-01-08 15:16:33,243 epoch 18 - iter 1956/6529 - loss 0.12848283 - samples/sec: 19.58 - lr: 0.000002
594
+ 2023-01-08 15:18:51,096 epoch 18 - iter 2608/6529 - loss 0.12838968 - samples/sec: 18.93 - lr: 0.000002
595
+ 2023-01-08 15:21:10,087 epoch 18 - iter 3260/6529 - loss 0.12981736 - samples/sec: 18.77 - lr: 0.000002
596
+ 2023-01-08 15:23:29,111 epoch 18 - iter 3912/6529 - loss 0.13057931 - samples/sec: 18.77 - lr: 0.000002
597
+ 2023-01-08 15:25:43,545 epoch 18 - iter 4564/6529 - loss 0.13001931 - samples/sec: 19.41 - lr: 0.000002
598
+ 2023-01-08 15:27:54,953 epoch 18 - iter 5216/6529 - loss 0.12977986 - samples/sec: 19.86 - lr: 0.000002
599
+ 2023-01-08 15:30:14,214 epoch 18 - iter 5868/6529 - loss 0.12975537 - samples/sec: 18.74 - lr: 0.000002
600
+ 2023-01-08 15:32:36,219 epoch 18 - iter 6520/6529 - loss 0.12954521 - samples/sec: 18.37 - lr: 0.000002
601
+ 2023-01-08 15:32:38,167 ----------------------------------------------------------------------------------------------------
602
+ 2023-01-08 15:32:38,170 EPOCH 18 done: loss 0.1296 - lr 0.0000016
603
+ 2023-01-08 15:33:54,240 DEV : loss 0.15394894778728485 - f1-score (micro avg) 0.9052
604
+ 2023-01-08 15:33:54,298 BAD EPOCHS (no improvement): 4
605
+ 2023-01-08 15:33:54,299 ----------------------------------------------------------------------------------------------------
606
+ 2023-01-08 15:36:11,665 epoch 19 - iter 652/6529 - loss 0.12689102 - samples/sec: 18.99 - lr: 0.000002
607
+ 2023-01-08 15:38:29,261 epoch 19 - iter 1304/6529 - loss 0.12725324 - samples/sec: 18.96 - lr: 0.000002
608
+ 2023-01-08 15:40:44,876 epoch 19 - iter 1956/6529 - loss 0.12817227 - samples/sec: 19.24 - lr: 0.000001
609
+ 2023-01-08 15:43:06,566 epoch 19 - iter 2608/6529 - loss 0.12814054 - samples/sec: 18.41 - lr: 0.000001
610
+ 2023-01-08 15:45:24,791 epoch 19 - iter 3260/6529 - loss 0.12827850 - samples/sec: 18.88 - lr: 0.000001
611
+ 2023-01-08 15:47:45,814 epoch 19 - iter 3912/6529 - loss 0.12965719 - samples/sec: 18.50 - lr: 0.000001
612
+ 2023-01-08 15:50:07,518 epoch 19 - iter 4564/6529 - loss 0.12988621 - samples/sec: 18.41 - lr: 0.000001
613
+ 2023-01-08 15:52:23,448 epoch 19 - iter 5216/6529 - loss 0.12933048 - samples/sec: 19.19 - lr: 0.000001
614
+ 2023-01-08 15:54:39,323 epoch 19 - iter 5868/6529 - loss 0.12917830 - samples/sec: 19.20 - lr: 0.000001
615
+ 2023-01-08 15:56:59,965 epoch 19 - iter 6520/6529 - loss 0.12867037 - samples/sec: 18.55 - lr: 0.000001
616
+ 2023-01-08 15:57:01,962 ----------------------------------------------------------------------------------------------------
617
+ 2023-01-08 15:57:01,966 EPOCH 19 done: loss 0.1287 - lr 0.0000013
618
+ 2023-01-08 15:58:18,604 DEV : loss 0.16147495806217194 - f1-score (micro avg) 0.9081
619
+ 2023-01-08 15:58:18,654 BAD EPOCHS (no improvement): 4
620
+ 2023-01-08 15:58:18,655 ----------------------------------------------------------------------------------------------------
621
+ 2023-01-08 16:00:36,476 epoch 20 - iter 652/6529 - loss 0.12667596 - samples/sec: 18.93 - lr: 0.000001
622
+ 2023-01-08 16:02:52,155 epoch 20 - iter 1304/6529 - loss 0.12812912 - samples/sec: 19.23 - lr: 0.000001
623
+ 2023-01-08 16:05:08,323 epoch 20 - iter 1956/6529 - loss 0.12789917 - samples/sec: 19.16 - lr: 0.000001
624
+ 2023-01-08 16:07:30,606 epoch 20 - iter 2608/6529 - loss 0.12736327 - samples/sec: 18.34 - lr: 0.000001
625
+ 2023-01-08 16:09:54,711 epoch 20 - iter 3260/6529 - loss 0.12770827 - samples/sec: 18.11 - lr: 0.000001
626
+ 2023-01-08 16:12:15,676 epoch 20 - iter 3912/6529 - loss 0.12872290 - samples/sec: 18.51 - lr: 0.000001
627
+ 2023-01-08 16:14:33,832 epoch 20 - iter 4564/6529 - loss 0.12840905 - samples/sec: 18.89 - lr: 0.000001
628
+ 2023-01-08 16:16:47,092 epoch 20 - iter 5216/6529 - loss 0.12787078 - samples/sec: 19.58 - lr: 0.000001
629
+ 2023-01-08 16:19:11,028 epoch 20 - iter 5868/6529 - loss 0.12740936 - samples/sec: 18.13 - lr: 0.000001
630
+ 2023-01-08 16:21:31,583 epoch 20 - iter 6520/6529 - loss 0.12713130 - samples/sec: 18.56 - lr: 0.000001
631
+ 2023-01-08 16:21:33,402 ----------------------------------------------------------------------------------------------------
632
+ 2023-01-08 16:21:33,405 EPOCH 20 done: loss 0.1271 - lr 0.0000011
633
+ 2023-01-08 16:22:49,932 DEV : loss 0.159995898604393 - f1-score (micro avg) 0.9074
634
+ 2023-01-08 16:22:49,984 BAD EPOCHS (no improvement): 4
635
+ 2023-01-08 16:22:49,987 ----------------------------------------------------------------------------------------------------
636
+ 2023-01-08 16:25:14,175 epoch 21 - iter 652/6529 - loss 0.12640983 - samples/sec: 18.10 - lr: 0.000001
637
+ 2023-01-08 16:27:31,741 epoch 21 - iter 1304/6529 - loss 0.12615217 - samples/sec: 18.97 - lr: 0.000001
638
+ 2023-01-08 16:29:44,470 epoch 21 - iter 1956/6529 - loss 0.12594671 - samples/sec: 19.66 - lr: 0.000001
639
+ 2023-01-08 16:32:00,986 epoch 21 - iter 2608/6529 - loss 0.12622617 - samples/sec: 19.11 - lr: 0.000001
640
+ 2023-01-08 16:34:16,973 epoch 21 - iter 3260/6529 - loss 0.12678245 - samples/sec: 19.19 - lr: 0.000001
641
+ 2023-01-08 16:36:36,020 epoch 21 - iter 3912/6529 - loss 0.12658296 - samples/sec: 18.76 - lr: 0.000001
642
+ 2023-01-08 16:38:57,014 epoch 21 - iter 4564/6529 - loss 0.12632625 - samples/sec: 18.51 - lr: 0.000001
643
+ 2023-01-08 16:41:11,449 epoch 21 - iter 5216/6529 - loss 0.12593025 - samples/sec: 19.41 - lr: 0.000001
644
+ 2023-01-08 16:43:32,645 epoch 21 - iter 5868/6529 - loss 0.12583151 - samples/sec: 18.48 - lr: 0.000001
645
+ 2023-01-08 16:45:56,623 epoch 21 - iter 6520/6529 - loss 0.12527594 - samples/sec: 18.12 - lr: 0.000001
646
+ 2023-01-08 16:45:58,452 ----------------------------------------------------------------------------------------------------
647
+ 2023-01-08 16:45:58,455 EPOCH 21 done: loss 0.1253 - lr 0.0000009
648
+ 2023-01-08 16:47:16,916 DEV : loss 0.16192322969436646 - f1-score (micro avg) 0.9084
649
+ 2023-01-08 16:47:16,966 BAD EPOCHS (no improvement): 4
650
+ 2023-01-08 16:47:16,968 ----------------------------------------------------------------------------------------------------
651
+ 2023-01-08 16:49:36,861 epoch 22 - iter 652/6529 - loss 0.12905441 - samples/sec: 18.65 - lr: 0.000001
652
+ 2023-01-08 16:51:51,836 epoch 22 - iter 1304/6529 - loss 0.12623396 - samples/sec: 19.33 - lr: 0.000001
653
+ 2023-01-08 16:54:09,617 epoch 22 - iter 1956/6529 - loss 0.12598739 - samples/sec: 18.94 - lr: 0.000001
654
+ 2023-01-08 16:56:26,735 epoch 22 - iter 2608/6529 - loss 0.12629697 - samples/sec: 19.03 - lr: 0.000001
655
+ 2023-01-08 16:58:48,363 epoch 22 - iter 3260/6529 - loss 0.12612927 - samples/sec: 18.42 - lr: 0.000001
656
+ 2023-01-08 17:01:07,899 epoch 22 - iter 3912/6529 - loss 0.12713583 - samples/sec: 18.70 - lr: 0.000001
657
+ 2023-01-08 17:03:25,452 epoch 22 - iter 4564/6529 - loss 0.12733269 - samples/sec: 18.97 - lr: 0.000001
658
+ 2023-01-08 17:05:38,176 epoch 22 - iter 5216/6529 - loss 0.12691323 - samples/sec: 19.66 - lr: 0.000001
659
+ 2023-01-08 17:07:56,590 epoch 22 - iter 5868/6529 - loss 0.12649450 - samples/sec: 18.85 - lr: 0.000001
660
+ 2023-01-08 17:10:19,008 epoch 22 - iter 6520/6529 - loss 0.12612071 - samples/sec: 18.32 - lr: 0.000001
661
+ 2023-01-08 17:10:20,935 ----------------------------------------------------------------------------------------------------
662
+ 2023-01-08 17:10:20,938 EPOCH 22 done: loss 0.1261 - lr 0.0000007
663
+ 2023-01-08 17:11:39,673 DEV : loss 0.160598024725914 - f1-score (micro avg) 0.9095
664
+ 2023-01-08 17:11:39,726 BAD EPOCHS (no improvement): 4
665
+ 2023-01-08 17:11:39,727 ----------------------------------------------------------------------------------------------------
666
+ 2023-01-08 17:13:59,008 epoch 23 - iter 652/6529 - loss 0.12423740 - samples/sec: 18.73 - lr: 0.000001
667
+ 2023-01-08 17:16:15,015 epoch 23 - iter 1304/6529 - loss 0.12636817 - samples/sec: 19.18 - lr: 0.000001
668
+ 2023-01-08 17:18:30,677 epoch 23 - iter 1956/6529 - loss 0.12724886 - samples/sec: 19.23 - lr: 0.000001
669
+ 2023-01-08 17:20:43,439 epoch 23 - iter 2608/6529 - loss 0.12629184 - samples/sec: 19.65 - lr: 0.000001
670
+ 2023-01-08 17:23:01,405 epoch 23 - iter 3260/6529 - loss 0.12601784 - samples/sec: 18.91 - lr: 0.000001
671
+ 2023-01-08 17:25:13,272 epoch 23 - iter 3912/6529 - loss 0.12569715 - samples/sec: 19.79 - lr: 0.000001
672
+ 2023-01-08 17:27:32,247 epoch 23 - iter 4564/6529 - loss 0.12589309 - samples/sec: 18.77 - lr: 0.000001
673
+ 2023-01-08 17:29:49,487 epoch 23 - iter 5216/6529 - loss 0.12574754 - samples/sec: 19.01 - lr: 0.000000
674
+ 2023-01-08 17:32:05,380 epoch 23 - iter 5868/6529 - loss 0.12524206 - samples/sec: 19.20 - lr: 0.000000
675
+ 2023-01-08 17:34:23,467 epoch 23 - iter 6520/6529 - loss 0.12483199 - samples/sec: 18.89 - lr: 0.000000
676
+ 2023-01-08 17:34:25,411 ----------------------------------------------------------------------------------------------------
677
+ 2023-01-08 17:34:25,414 EPOCH 23 done: loss 0.1248 - lr 0.0000004
678
+ 2023-01-08 17:35:39,808 DEV : loss 0.161932572722435 - f1-score (micro avg) 0.9107
679
+ 2023-01-08 17:35:39,860 BAD EPOCHS (no improvement): 4
680
+ 2023-01-08 17:35:39,861 ----------------------------------------------------------------------------------------------------
681
+ 2023-01-08 17:38:02,048 epoch 24 - iter 652/6529 - loss 0.12624568 - samples/sec: 18.35 - lr: 0.000000
682
+ 2023-01-08 17:40:21,980 epoch 24 - iter 1304/6529 - loss 0.12467521 - samples/sec: 18.65 - lr: 0.000000
683
+ 2023-01-08 17:42:40,219 epoch 24 - iter 1956/6529 - loss 0.12463381 - samples/sec: 18.87 - lr: 0.000000
684
+ 2023-01-08 17:45:00,289 epoch 24 - iter 2608/6529 - loss 0.12397927 - samples/sec: 18.63 - lr: 0.000000
685
+ 2023-01-08 17:47:21,805 epoch 24 - iter 3260/6529 - loss 0.12523877 - samples/sec: 18.44 - lr: 0.000000
686
+ 2023-01-08 17:49:40,558 epoch 24 - iter 3912/6529 - loss 0.12577788 - samples/sec: 18.80 - lr: 0.000000
687
+ 2023-01-08 17:51:56,485 epoch 24 - iter 4564/6529 - loss 0.12575965 - samples/sec: 19.20 - lr: 0.000000
688
+ 2023-01-08 17:54:10,689 epoch 24 - iter 5216/6529 - loss 0.12505960 - samples/sec: 19.44 - lr: 0.000000
689
+ 2023-01-08 17:56:28,819 epoch 24 - iter 5868/6529 - loss 0.12454837 - samples/sec: 18.89 - lr: 0.000000
690
+ 2023-01-08 17:58:56,244 epoch 24 - iter 6520/6529 - loss 0.12429321 - samples/sec: 17.70 - lr: 0.000000
691
+ 2023-01-08 17:58:58,354 ----------------------------------------------------------------------------------------------------
692
+ 2023-01-08 17:58:58,357 EPOCH 24 done: loss 0.1243 - lr 0.0000002
693
+ 2023-01-08 18:00:13,633 DEV : loss 0.16107264161109924 - f1-score (micro avg) 0.9111
694
+ 2023-01-08 18:00:13,688 BAD EPOCHS (no improvement): 4
695
+ 2023-01-08 18:00:13,690 ----------------------------------------------------------------------------------------------------
696
+ 2023-01-08 18:02:30,863 epoch 25 - iter 652/6529 - loss 0.12185023 - samples/sec: 19.02 - lr: 0.000000
697
+ 2023-01-08 18:04:48,105 epoch 25 - iter 1304/6529 - loss 0.12151675 - samples/sec: 19.01 - lr: 0.000000
698
+ 2023-01-08 18:07:03,845 epoch 25 - iter 1956/6529 - loss 0.12293666 - samples/sec: 19.22 - lr: 0.000000
699
+ 2023-01-08 18:09:20,797 epoch 25 - iter 2608/6529 - loss 0.12248209 - samples/sec: 19.05 - lr: 0.000000
700
+ 2023-01-08 18:11:38,782 epoch 25 - iter 3260/6529 - loss 0.12236612 - samples/sec: 18.91 - lr: 0.000000
701
+ 2023-01-08 18:13:57,739 epoch 25 - iter 3912/6529 - loss 0.12284535 - samples/sec: 18.78 - lr: 0.000000
702
+ 2023-01-08 18:16:19,460 epoch 25 - iter 4564/6529 - loss 0.12312537 - samples/sec: 18.41 - lr: 0.000000
703
+ 2023-01-08 18:18:34,844 epoch 25 - iter 5216/6529 - loss 0.12315613 - samples/sec: 19.27 - lr: 0.000000
704
+ 2023-01-08 18:20:52,724 epoch 25 - iter 5868/6529 - loss 0.12280164 - samples/sec: 18.92 - lr: 0.000000
705
+ 2023-01-08 18:23:11,733 epoch 25 - iter 6520/6529 - loss 0.12286952 - samples/sec: 18.77 - lr: 0.000000
706
+ 2023-01-08 18:23:13,587 ----------------------------------------------------------------------------------------------------
707
+ 2023-01-08 18:23:13,590 EPOCH 25 done: loss 0.1229 - lr 0.0000000
708
+ 2023-01-08 18:24:28,587 DEV : loss 0.1607247292995453 - f1-score (micro avg) 0.9119
709
+ 2023-01-08 18:24:28,641 BAD EPOCHS (no improvement): 4
710
+ 2023-01-08 18:24:29,854 ----------------------------------------------------------------------------------------------------
711
+ 2023-01-08 18:24:29,857 Testing using last state of model ...
712
+ 2023-01-08 18:25:11,654 0.9081 0.8984 0.9033 0.8277
713
+ 2023-01-08 18:25:11,656
714
+ Results:
715
+ - F-score (micro) 0.9033
716
+ - F-score (macro) 0.8976
717
+ - Accuracy 0.8277
718
+
719
+ By class:
720
+ precision recall f1-score support
721
+
722
+ ORG 0.9016 0.8667 0.8838 1523
723
+ LOC 0.9113 0.9305 0.9208 1425
724
+ PER 0.9216 0.9322 0.9269 1224
725
+ DAT 0.8623 0.7958 0.8277 480
726
+ MON 0.9665 0.9558 0.9611 181
727
+ PCT 0.9375 0.9740 0.9554 77
728
+ TIM 0.8235 0.7925 0.8077 53
729
+
730
+ micro avg 0.9081 0.8984 0.9033 4963
731
+ macro avg 0.9035 0.8925 0.8976 4963
732
+ weighted avg 0.9076 0.8984 0.9028 4963
733
+ samples avg 0.8277 0.8277 0.8277 4963
734
+
735
+ 2023-01-08 18:25:11,656 ----------------------------------------------------------------------------------------------------