stefan-it commited on
Commit
07e79d1
1 Parent(s): 67d70bc

Upload ./training.log with huggingface_hub

Browse files
Files changed (1) hide show
  1. training.log +507 -0
training.log ADDED
@@ -0,0 +1,507 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-24 16:55:43,926 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-24 16:55:43,927 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): BertModel(
5
+ (embeddings): BertEmbeddings(
6
+ (word_embeddings): Embedding(64001, 768)
7
+ (position_embeddings): Embedding(512, 768)
8
+ (token_type_embeddings): Embedding(2, 768)
9
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): BertEncoder(
13
+ (layer): ModuleList(
14
+ (0): BertLayer(
15
+ (attention): BertAttention(
16
+ (self): BertSelfAttention(
17
+ (query): Linear(in_features=768, out_features=768, bias=True)
18
+ (key): Linear(in_features=768, out_features=768, bias=True)
19
+ (value): Linear(in_features=768, out_features=768, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): BertSelfOutput(
23
+ (dense): Linear(in_features=768, out_features=768, bias=True)
24
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): BertIntermediate(
29
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
30
+ (intermediate_act_fn): GELUActivation()
31
+ )
32
+ (output): BertOutput(
33
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
34
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
35
+ (dropout): Dropout(p=0.1, inplace=False)
36
+ )
37
+ )
38
+ (1): BertLayer(
39
+ (attention): BertAttention(
40
+ (self): BertSelfAttention(
41
+ (query): Linear(in_features=768, out_features=768, bias=True)
42
+ (key): Linear(in_features=768, out_features=768, bias=True)
43
+ (value): Linear(in_features=768, out_features=768, bias=True)
44
+ (dropout): Dropout(p=0.1, inplace=False)
45
+ )
46
+ (output): BertSelfOutput(
47
+ (dense): Linear(in_features=768, out_features=768, bias=True)
48
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
49
+ (dropout): Dropout(p=0.1, inplace=False)
50
+ )
51
+ )
52
+ (intermediate): BertIntermediate(
53
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
54
+ (intermediate_act_fn): GELUActivation()
55
+ )
56
+ (output): BertOutput(
57
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
58
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
59
+ (dropout): Dropout(p=0.1, inplace=False)
60
+ )
61
+ )
62
+ (2): BertLayer(
63
+ (attention): BertAttention(
64
+ (self): BertSelfAttention(
65
+ (query): Linear(in_features=768, out_features=768, bias=True)
66
+ (key): Linear(in_features=768, out_features=768, bias=True)
67
+ (value): Linear(in_features=768, out_features=768, bias=True)
68
+ (dropout): Dropout(p=0.1, inplace=False)
69
+ )
70
+ (output): BertSelfOutput(
71
+ (dense): Linear(in_features=768, out_features=768, bias=True)
72
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
73
+ (dropout): Dropout(p=0.1, inplace=False)
74
+ )
75
+ )
76
+ (intermediate): BertIntermediate(
77
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
78
+ (intermediate_act_fn): GELUActivation()
79
+ )
80
+ (output): BertOutput(
81
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
82
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
83
+ (dropout): Dropout(p=0.1, inplace=False)
84
+ )
85
+ )
86
+ (3): BertLayer(
87
+ (attention): BertAttention(
88
+ (self): BertSelfAttention(
89
+ (query): Linear(in_features=768, out_features=768, bias=True)
90
+ (key): Linear(in_features=768, out_features=768, bias=True)
91
+ (value): Linear(in_features=768, out_features=768, bias=True)
92
+ (dropout): Dropout(p=0.1, inplace=False)
93
+ )
94
+ (output): BertSelfOutput(
95
+ (dense): Linear(in_features=768, out_features=768, bias=True)
96
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
97
+ (dropout): Dropout(p=0.1, inplace=False)
98
+ )
99
+ )
100
+ (intermediate): BertIntermediate(
101
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
102
+ (intermediate_act_fn): GELUActivation()
103
+ )
104
+ (output): BertOutput(
105
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
106
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
107
+ (dropout): Dropout(p=0.1, inplace=False)
108
+ )
109
+ )
110
+ (4): BertLayer(
111
+ (attention): BertAttention(
112
+ (self): BertSelfAttention(
113
+ (query): Linear(in_features=768, out_features=768, bias=True)
114
+ (key): Linear(in_features=768, out_features=768, bias=True)
115
+ (value): Linear(in_features=768, out_features=768, bias=True)
116
+ (dropout): Dropout(p=0.1, inplace=False)
117
+ )
118
+ (output): BertSelfOutput(
119
+ (dense): Linear(in_features=768, out_features=768, bias=True)
120
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
121
+ (dropout): Dropout(p=0.1, inplace=False)
122
+ )
123
+ )
124
+ (intermediate): BertIntermediate(
125
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
126
+ (intermediate_act_fn): GELUActivation()
127
+ )
128
+ (output): BertOutput(
129
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
130
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
131
+ (dropout): Dropout(p=0.1, inplace=False)
132
+ )
133
+ )
134
+ (5): BertLayer(
135
+ (attention): BertAttention(
136
+ (self): BertSelfAttention(
137
+ (query): Linear(in_features=768, out_features=768, bias=True)
138
+ (key): Linear(in_features=768, out_features=768, bias=True)
139
+ (value): Linear(in_features=768, out_features=768, bias=True)
140
+ (dropout): Dropout(p=0.1, inplace=False)
141
+ )
142
+ (output): BertSelfOutput(
143
+ (dense): Linear(in_features=768, out_features=768, bias=True)
144
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
145
+ (dropout): Dropout(p=0.1, inplace=False)
146
+ )
147
+ )
148
+ (intermediate): BertIntermediate(
149
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
150
+ (intermediate_act_fn): GELUActivation()
151
+ )
152
+ (output): BertOutput(
153
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
154
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
155
+ (dropout): Dropout(p=0.1, inplace=False)
156
+ )
157
+ )
158
+ (6): BertLayer(
159
+ (attention): BertAttention(
160
+ (self): BertSelfAttention(
161
+ (query): Linear(in_features=768, out_features=768, bias=True)
162
+ (key): Linear(in_features=768, out_features=768, bias=True)
163
+ (value): Linear(in_features=768, out_features=768, bias=True)
164
+ (dropout): Dropout(p=0.1, inplace=False)
165
+ )
166
+ (output): BertSelfOutput(
167
+ (dense): Linear(in_features=768, out_features=768, bias=True)
168
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
169
+ (dropout): Dropout(p=0.1, inplace=False)
170
+ )
171
+ )
172
+ (intermediate): BertIntermediate(
173
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
174
+ (intermediate_act_fn): GELUActivation()
175
+ )
176
+ (output): BertOutput(
177
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
178
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
179
+ (dropout): Dropout(p=0.1, inplace=False)
180
+ )
181
+ )
182
+ (7): BertLayer(
183
+ (attention): BertAttention(
184
+ (self): BertSelfAttention(
185
+ (query): Linear(in_features=768, out_features=768, bias=True)
186
+ (key): Linear(in_features=768, out_features=768, bias=True)
187
+ (value): Linear(in_features=768, out_features=768, bias=True)
188
+ (dropout): Dropout(p=0.1, inplace=False)
189
+ )
190
+ (output): BertSelfOutput(
191
+ (dense): Linear(in_features=768, out_features=768, bias=True)
192
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
193
+ (dropout): Dropout(p=0.1, inplace=False)
194
+ )
195
+ )
196
+ (intermediate): BertIntermediate(
197
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
198
+ (intermediate_act_fn): GELUActivation()
199
+ )
200
+ (output): BertOutput(
201
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
202
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
203
+ (dropout): Dropout(p=0.1, inplace=False)
204
+ )
205
+ )
206
+ (8): BertLayer(
207
+ (attention): BertAttention(
208
+ (self): BertSelfAttention(
209
+ (query): Linear(in_features=768, out_features=768, bias=True)
210
+ (key): Linear(in_features=768, out_features=768, bias=True)
211
+ (value): Linear(in_features=768, out_features=768, bias=True)
212
+ (dropout): Dropout(p=0.1, inplace=False)
213
+ )
214
+ (output): BertSelfOutput(
215
+ (dense): Linear(in_features=768, out_features=768, bias=True)
216
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
217
+ (dropout): Dropout(p=0.1, inplace=False)
218
+ )
219
+ )
220
+ (intermediate): BertIntermediate(
221
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
222
+ (intermediate_act_fn): GELUActivation()
223
+ )
224
+ (output): BertOutput(
225
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
226
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
227
+ (dropout): Dropout(p=0.1, inplace=False)
228
+ )
229
+ )
230
+ (9): BertLayer(
231
+ (attention): BertAttention(
232
+ (self): BertSelfAttention(
233
+ (query): Linear(in_features=768, out_features=768, bias=True)
234
+ (key): Linear(in_features=768, out_features=768, bias=True)
235
+ (value): Linear(in_features=768, out_features=768, bias=True)
236
+ (dropout): Dropout(p=0.1, inplace=False)
237
+ )
238
+ (output): BertSelfOutput(
239
+ (dense): Linear(in_features=768, out_features=768, bias=True)
240
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
241
+ (dropout): Dropout(p=0.1, inplace=False)
242
+ )
243
+ )
244
+ (intermediate): BertIntermediate(
245
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
246
+ (intermediate_act_fn): GELUActivation()
247
+ )
248
+ (output): BertOutput(
249
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
250
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
251
+ (dropout): Dropout(p=0.1, inplace=False)
252
+ )
253
+ )
254
+ (10): BertLayer(
255
+ (attention): BertAttention(
256
+ (self): BertSelfAttention(
257
+ (query): Linear(in_features=768, out_features=768, bias=True)
258
+ (key): Linear(in_features=768, out_features=768, bias=True)
259
+ (value): Linear(in_features=768, out_features=768, bias=True)
260
+ (dropout): Dropout(p=0.1, inplace=False)
261
+ )
262
+ (output): BertSelfOutput(
263
+ (dense): Linear(in_features=768, out_features=768, bias=True)
264
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
265
+ (dropout): Dropout(p=0.1, inplace=False)
266
+ )
267
+ )
268
+ (intermediate): BertIntermediate(
269
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
270
+ (intermediate_act_fn): GELUActivation()
271
+ )
272
+ (output): BertOutput(
273
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
274
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
275
+ (dropout): Dropout(p=0.1, inplace=False)
276
+ )
277
+ )
278
+ (11): BertLayer(
279
+ (attention): BertAttention(
280
+ (self): BertSelfAttention(
281
+ (query): Linear(in_features=768, out_features=768, bias=True)
282
+ (key): Linear(in_features=768, out_features=768, bias=True)
283
+ (value): Linear(in_features=768, out_features=768, bias=True)
284
+ (dropout): Dropout(p=0.1, inplace=False)
285
+ )
286
+ (output): BertSelfOutput(
287
+ (dense): Linear(in_features=768, out_features=768, bias=True)
288
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
289
+ (dropout): Dropout(p=0.1, inplace=False)
290
+ )
291
+ )
292
+ (intermediate): BertIntermediate(
293
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
294
+ (intermediate_act_fn): GELUActivation()
295
+ )
296
+ (output): BertOutput(
297
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
298
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
299
+ (dropout): Dropout(p=0.1, inplace=False)
300
+ )
301
+ )
302
+ )
303
+ )
304
+ (pooler): BertPooler(
305
+ (dense): Linear(in_features=768, out_features=768, bias=True)
306
+ (activation): Tanh()
307
+ )
308
+ )
309
+ )
310
+ (locked_dropout): LockedDropout(p=0.5)
311
+ (linear): Linear(in_features=768, out_features=13, bias=True)
312
+ (loss_function): CrossEntropyLoss()
313
+ )"
314
+ 2023-10-24 16:55:43,927 ----------------------------------------------------------------------------------------------------
315
+ 2023-10-24 16:55:43,927 MultiCorpus: 7936 train + 992 dev + 992 test sentences
316
+ - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /home/ubuntu/.flair/datasets/ner_icdar_europeana/fr
317
+ 2023-10-24 16:55:43,927 ----------------------------------------------------------------------------------------------------
318
+ 2023-10-24 16:55:43,927 Train: 7936 sentences
319
+ 2023-10-24 16:55:43,927 (train_with_dev=False, train_with_test=False)
320
+ 2023-10-24 16:55:43,928 ----------------------------------------------------------------------------------------------------
321
+ 2023-10-24 16:55:43,928 Training Params:
322
+ 2023-10-24 16:55:43,928 - learning_rate: "3e-05"
323
+ 2023-10-24 16:55:43,928 - mini_batch_size: "4"
324
+ 2023-10-24 16:55:43,928 - max_epochs: "10"
325
+ 2023-10-24 16:55:43,928 - shuffle: "True"
326
+ 2023-10-24 16:55:43,928 ----------------------------------------------------------------------------------------------------
327
+ 2023-10-24 16:55:43,928 Plugins:
328
+ 2023-10-24 16:55:43,928 - TensorboardLogger
329
+ 2023-10-24 16:55:43,928 - LinearScheduler | warmup_fraction: '0.1'
330
+ 2023-10-24 16:55:43,928 ----------------------------------------------------------------------------------------------------
331
+ 2023-10-24 16:55:43,928 Final evaluation on model from best epoch (best-model.pt)
332
+ 2023-10-24 16:55:43,928 - metric: "('micro avg', 'f1-score')"
333
+ 2023-10-24 16:55:43,928 ----------------------------------------------------------------------------------------------------
334
+ 2023-10-24 16:55:43,928 Computation:
335
+ 2023-10-24 16:55:43,928 - compute on device: cuda:0
336
+ 2023-10-24 16:55:43,928 - embedding storage: none
337
+ 2023-10-24 16:55:43,928 ----------------------------------------------------------------------------------------------------
338
+ 2023-10-24 16:55:43,928 Model training base path: "hmbench-icdar/fr-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-3"
339
+ 2023-10-24 16:55:43,928 ----------------------------------------------------------------------------------------------------
340
+ 2023-10-24 16:55:43,928 ----------------------------------------------------------------------------------------------------
341
+ 2023-10-24 16:55:43,928 Logging anything other than scalars to TensorBoard is currently not supported.
342
+ 2023-10-24 16:55:56,135 epoch 1 - iter 198/1984 - loss 1.38943938 - time (sec): 12.21 - samples/sec: 1429.33 - lr: 0.000003 - momentum: 0.000000
343
+ 2023-10-24 16:56:08,160 epoch 1 - iter 396/1984 - loss 0.87338015 - time (sec): 24.23 - samples/sec: 1389.55 - lr: 0.000006 - momentum: 0.000000
344
+ 2023-10-24 16:56:20,110 epoch 1 - iter 594/1984 - loss 0.65982021 - time (sec): 36.18 - samples/sec: 1357.07 - lr: 0.000009 - momentum: 0.000000
345
+ 2023-10-24 16:56:32,204 epoch 1 - iter 792/1984 - loss 0.53649000 - time (sec): 48.28 - samples/sec: 1359.51 - lr: 0.000012 - momentum: 0.000000
346
+ 2023-10-24 16:56:44,187 epoch 1 - iter 990/1984 - loss 0.46413907 - time (sec): 60.26 - samples/sec: 1349.64 - lr: 0.000015 - momentum: 0.000000
347
+ 2023-10-24 16:56:56,148 epoch 1 - iter 1188/1984 - loss 0.41329533 - time (sec): 72.22 - samples/sec: 1344.79 - lr: 0.000018 - momentum: 0.000000
348
+ 2023-10-24 16:57:08,421 epoch 1 - iter 1386/1984 - loss 0.36950313 - time (sec): 84.49 - samples/sec: 1347.81 - lr: 0.000021 - momentum: 0.000000
349
+ 2023-10-24 16:57:20,569 epoch 1 - iter 1584/1984 - loss 0.33972792 - time (sec): 96.64 - samples/sec: 1348.12 - lr: 0.000024 - momentum: 0.000000
350
+ 2023-10-24 16:57:32,795 epoch 1 - iter 1782/1984 - loss 0.31934420 - time (sec): 108.87 - samples/sec: 1353.02 - lr: 0.000027 - momentum: 0.000000
351
+ 2023-10-24 16:57:44,981 epoch 1 - iter 1980/1984 - loss 0.30139055 - time (sec): 121.05 - samples/sec: 1351.60 - lr: 0.000030 - momentum: 0.000000
352
+ 2023-10-24 16:57:45,234 ----------------------------------------------------------------------------------------------------
353
+ 2023-10-24 16:57:45,234 EPOCH 1 done: loss 0.3010 - lr: 0.000030
354
+ 2023-10-24 16:57:48,301 DEV : loss 0.08988756686449051 - f1-score (micro avg) 0.7331
355
+ 2023-10-24 16:57:48,316 saving best model
356
+ 2023-10-24 16:57:48,785 ----------------------------------------------------------------------------------------------------
357
+ 2023-10-24 16:58:00,799 epoch 2 - iter 198/1984 - loss 0.11015239 - time (sec): 12.01 - samples/sec: 1357.08 - lr: 0.000030 - momentum: 0.000000
358
+ 2023-10-24 16:58:12,903 epoch 2 - iter 396/1984 - loss 0.11165230 - time (sec): 24.12 - samples/sec: 1347.03 - lr: 0.000029 - momentum: 0.000000
359
+ 2023-10-24 16:58:24,975 epoch 2 - iter 594/1984 - loss 0.11377525 - time (sec): 36.19 - samples/sec: 1350.17 - lr: 0.000029 - momentum: 0.000000
360
+ 2023-10-24 16:58:37,241 epoch 2 - iter 792/1984 - loss 0.11506086 - time (sec): 48.46 - samples/sec: 1353.84 - lr: 0.000029 - momentum: 0.000000
361
+ 2023-10-24 16:58:49,374 epoch 2 - iter 990/1984 - loss 0.11305507 - time (sec): 60.59 - samples/sec: 1360.09 - lr: 0.000028 - momentum: 0.000000
362
+ 2023-10-24 16:59:01,455 epoch 2 - iter 1188/1984 - loss 0.11198699 - time (sec): 72.67 - samples/sec: 1358.52 - lr: 0.000028 - momentum: 0.000000
363
+ 2023-10-24 16:59:13,628 epoch 2 - iter 1386/1984 - loss 0.11098977 - time (sec): 84.84 - samples/sec: 1361.77 - lr: 0.000028 - momentum: 0.000000
364
+ 2023-10-24 16:59:25,721 epoch 2 - iter 1584/1984 - loss 0.10937736 - time (sec): 96.94 - samples/sec: 1353.64 - lr: 0.000027 - momentum: 0.000000
365
+ 2023-10-24 16:59:37,810 epoch 2 - iter 1782/1984 - loss 0.11058550 - time (sec): 109.02 - samples/sec: 1350.06 - lr: 0.000027 - momentum: 0.000000
366
+ 2023-10-24 16:59:50,060 epoch 2 - iter 1980/1984 - loss 0.11259310 - time (sec): 121.27 - samples/sec: 1350.04 - lr: 0.000027 - momentum: 0.000000
367
+ 2023-10-24 16:59:50,293 ----------------------------------------------------------------------------------------------------
368
+ 2023-10-24 16:59:50,293 EPOCH 2 done: loss 0.1125 - lr: 0.000027
369
+ 2023-10-24 16:59:53,708 DEV : loss 0.09264685958623886 - f1-score (micro avg) 0.7178
370
+ 2023-10-24 16:59:53,723 ----------------------------------------------------------------------------------------------------
371
+ 2023-10-24 17:00:05,764 epoch 3 - iter 198/1984 - loss 0.07194458 - time (sec): 12.04 - samples/sec: 1344.06 - lr: 0.000026 - momentum: 0.000000
372
+ 2023-10-24 17:00:17,986 epoch 3 - iter 396/1984 - loss 0.07937385 - time (sec): 24.26 - samples/sec: 1357.81 - lr: 0.000026 - momentum: 0.000000
373
+ 2023-10-24 17:00:30,051 epoch 3 - iter 594/1984 - loss 0.08581189 - time (sec): 36.33 - samples/sec: 1345.29 - lr: 0.000026 - momentum: 0.000000
374
+ 2023-10-24 17:00:42,040 epoch 3 - iter 792/1984 - loss 0.08360831 - time (sec): 48.32 - samples/sec: 1340.89 - lr: 0.000025 - momentum: 0.000000
375
+ 2023-10-24 17:00:54,457 epoch 3 - iter 990/1984 - loss 0.08135214 - time (sec): 60.73 - samples/sec: 1356.85 - lr: 0.000025 - momentum: 0.000000
376
+ 2023-10-24 17:01:06,626 epoch 3 - iter 1188/1984 - loss 0.08288010 - time (sec): 72.90 - samples/sec: 1355.03 - lr: 0.000025 - momentum: 0.000000
377
+ 2023-10-24 17:01:18,609 epoch 3 - iter 1386/1984 - loss 0.08319010 - time (sec): 84.89 - samples/sec: 1351.84 - lr: 0.000024 - momentum: 0.000000
378
+ 2023-10-24 17:01:30,806 epoch 3 - iter 1584/1984 - loss 0.08252110 - time (sec): 97.08 - samples/sec: 1352.96 - lr: 0.000024 - momentum: 0.000000
379
+ 2023-10-24 17:01:42,960 epoch 3 - iter 1782/1984 - loss 0.08183765 - time (sec): 109.24 - samples/sec: 1353.28 - lr: 0.000024 - momentum: 0.000000
380
+ 2023-10-24 17:01:54,990 epoch 3 - iter 1980/1984 - loss 0.08232756 - time (sec): 121.27 - samples/sec: 1350.29 - lr: 0.000023 - momentum: 0.000000
381
+ 2023-10-24 17:01:55,227 ----------------------------------------------------------------------------------------------------
382
+ 2023-10-24 17:01:55,228 EPOCH 3 done: loss 0.0823 - lr: 0.000023
383
+ 2023-10-24 17:01:58,337 DEV : loss 0.12425895780324936 - f1-score (micro avg) 0.748
384
+ 2023-10-24 17:01:58,353 saving best model
385
+ 2023-10-24 17:01:58,959 ----------------------------------------------------------------------------------------------------
386
+ 2023-10-24 17:02:10,932 epoch 4 - iter 198/1984 - loss 0.05660289 - time (sec): 11.97 - samples/sec: 1320.40 - lr: 0.000023 - momentum: 0.000000
387
+ 2023-10-24 17:02:23,327 epoch 4 - iter 396/1984 - loss 0.06137809 - time (sec): 24.37 - samples/sec: 1345.54 - lr: 0.000023 - momentum: 0.000000
388
+ 2023-10-24 17:02:35,490 epoch 4 - iter 594/1984 - loss 0.05994790 - time (sec): 36.53 - samples/sec: 1347.81 - lr: 0.000022 - momentum: 0.000000
389
+ 2023-10-24 17:02:47,476 epoch 4 - iter 792/1984 - loss 0.06293485 - time (sec): 48.52 - samples/sec: 1342.62 - lr: 0.000022 - momentum: 0.000000
390
+ 2023-10-24 17:02:59,577 epoch 4 - iter 990/1984 - loss 0.06173995 - time (sec): 60.62 - samples/sec: 1346.36 - lr: 0.000022 - momentum: 0.000000
391
+ 2023-10-24 17:03:11,344 epoch 4 - iter 1188/1984 - loss 0.06025572 - time (sec): 72.38 - samples/sec: 1331.03 - lr: 0.000021 - momentum: 0.000000
392
+ 2023-10-24 17:03:23,543 epoch 4 - iter 1386/1984 - loss 0.06154385 - time (sec): 84.58 - samples/sec: 1341.57 - lr: 0.000021 - momentum: 0.000000
393
+ 2023-10-24 17:03:35,705 epoch 4 - iter 1584/1984 - loss 0.06060378 - time (sec): 96.74 - samples/sec: 1340.90 - lr: 0.000021 - momentum: 0.000000
394
+ 2023-10-24 17:03:48,022 epoch 4 - iter 1782/1984 - loss 0.06089033 - time (sec): 109.06 - samples/sec: 1341.13 - lr: 0.000020 - momentum: 0.000000
395
+ 2023-10-24 17:04:00,486 epoch 4 - iter 1980/1984 - loss 0.06041258 - time (sec): 121.53 - samples/sec: 1346.67 - lr: 0.000020 - momentum: 0.000000
396
+ 2023-10-24 17:04:00,727 ----------------------------------------------------------------------------------------------------
397
+ 2023-10-24 17:04:00,727 EPOCH 4 done: loss 0.0603 - lr: 0.000020
398
+ 2023-10-24 17:04:03,851 DEV : loss 0.17354941368103027 - f1-score (micro avg) 0.7452
399
+ 2023-10-24 17:04:03,866 ----------------------------------------------------------------------------------------------------
400
+ 2023-10-24 17:04:16,120 epoch 5 - iter 198/1984 - loss 0.03818146 - time (sec): 12.25 - samples/sec: 1386.56 - lr: 0.000020 - momentum: 0.000000
401
+ 2023-10-24 17:04:28,224 epoch 5 - iter 396/1984 - loss 0.04086454 - time (sec): 24.36 - samples/sec: 1355.49 - lr: 0.000019 - momentum: 0.000000
402
+ 2023-10-24 17:04:40,492 epoch 5 - iter 594/1984 - loss 0.04353732 - time (sec): 36.62 - samples/sec: 1353.93 - lr: 0.000019 - momentum: 0.000000
403
+ 2023-10-24 17:04:52,589 epoch 5 - iter 792/1984 - loss 0.04314748 - time (sec): 48.72 - samples/sec: 1339.17 - lr: 0.000019 - momentum: 0.000000
404
+ 2023-10-24 17:05:04,686 epoch 5 - iter 990/1984 - loss 0.04590035 - time (sec): 60.82 - samples/sec: 1347.08 - lr: 0.000018 - momentum: 0.000000
405
+ 2023-10-24 17:05:16,672 epoch 5 - iter 1188/1984 - loss 0.04501377 - time (sec): 72.81 - samples/sec: 1343.70 - lr: 0.000018 - momentum: 0.000000
406
+ 2023-10-24 17:05:28,989 epoch 5 - iter 1386/1984 - loss 0.04487474 - time (sec): 85.12 - samples/sec: 1348.08 - lr: 0.000018 - momentum: 0.000000
407
+ 2023-10-24 17:05:41,166 epoch 5 - iter 1584/1984 - loss 0.04660773 - time (sec): 97.30 - samples/sec: 1347.44 - lr: 0.000017 - momentum: 0.000000
408
+ 2023-10-24 17:05:53,109 epoch 5 - iter 1782/1984 - loss 0.04691609 - time (sec): 109.24 - samples/sec: 1345.32 - lr: 0.000017 - momentum: 0.000000
409
+ 2023-10-24 17:06:05,294 epoch 5 - iter 1980/1984 - loss 0.04565686 - time (sec): 121.43 - samples/sec: 1347.64 - lr: 0.000017 - momentum: 0.000000
410
+ 2023-10-24 17:06:05,541 ----------------------------------------------------------------------------------------------------
411
+ 2023-10-24 17:06:05,541 EPOCH 5 done: loss 0.0458 - lr: 0.000017
412
+ 2023-10-24 17:06:08,662 DEV : loss 0.21091219782829285 - f1-score (micro avg) 0.7348
413
+ 2023-10-24 17:06:08,678 ----------------------------------------------------------------------------------------------------
414
+ 2023-10-24 17:06:20,950 epoch 6 - iter 198/1984 - loss 0.03634734 - time (sec): 12.27 - samples/sec: 1323.11 - lr: 0.000016 - momentum: 0.000000
415
+ 2023-10-24 17:06:33,123 epoch 6 - iter 396/1984 - loss 0.03594311 - time (sec): 24.44 - samples/sec: 1350.94 - lr: 0.000016 - momentum: 0.000000
416
+ 2023-10-24 17:06:45,222 epoch 6 - iter 594/1984 - loss 0.03443057 - time (sec): 36.54 - samples/sec: 1356.66 - lr: 0.000016 - momentum: 0.000000
417
+ 2023-10-24 17:06:57,225 epoch 6 - iter 792/1984 - loss 0.03374808 - time (sec): 48.55 - samples/sec: 1356.48 - lr: 0.000015 - momentum: 0.000000
418
+ 2023-10-24 17:07:09,782 epoch 6 - iter 990/1984 - loss 0.03467893 - time (sec): 61.10 - samples/sec: 1352.93 - lr: 0.000015 - momentum: 0.000000
419
+ 2023-10-24 17:07:21,887 epoch 6 - iter 1188/1984 - loss 0.03450278 - time (sec): 73.21 - samples/sec: 1347.25 - lr: 0.000015 - momentum: 0.000000
420
+ 2023-10-24 17:07:33,957 epoch 6 - iter 1386/1984 - loss 0.03433692 - time (sec): 85.28 - samples/sec: 1341.65 - lr: 0.000014 - momentum: 0.000000
421
+ 2023-10-24 17:07:46,058 epoch 6 - iter 1584/1984 - loss 0.03334634 - time (sec): 97.38 - samples/sec: 1342.51 - lr: 0.000014 - momentum: 0.000000
422
+ 2023-10-24 17:07:58,090 epoch 6 - iter 1782/1984 - loss 0.03396585 - time (sec): 109.41 - samples/sec: 1336.77 - lr: 0.000014 - momentum: 0.000000
423
+ 2023-10-24 17:08:10,180 epoch 6 - iter 1980/1984 - loss 0.03474326 - time (sec): 121.50 - samples/sec: 1347.16 - lr: 0.000013 - momentum: 0.000000
424
+ 2023-10-24 17:08:10,422 ----------------------------------------------------------------------------------------------------
425
+ 2023-10-24 17:08:10,422 EPOCH 6 done: loss 0.0347 - lr: 0.000013
426
+ 2023-10-24 17:08:13,549 DEV : loss 0.1887310892343521 - f1-score (micro avg) 0.7538
427
+ 2023-10-24 17:08:13,565 saving best model
428
+ 2023-10-24 17:08:14,156 ----------------------------------------------------------------------------------------------------
429
+ 2023-10-24 17:08:26,505 epoch 7 - iter 198/1984 - loss 0.02827359 - time (sec): 12.35 - samples/sec: 1359.01 - lr: 0.000013 - momentum: 0.000000
430
+ 2023-10-24 17:08:38,484 epoch 7 - iter 396/1984 - loss 0.02865567 - time (sec): 24.33 - samples/sec: 1334.37 - lr: 0.000013 - momentum: 0.000000
431
+ 2023-10-24 17:08:50,648 epoch 7 - iter 594/1984 - loss 0.02475648 - time (sec): 36.49 - samples/sec: 1335.51 - lr: 0.000012 - momentum: 0.000000
432
+ 2023-10-24 17:09:02,801 epoch 7 - iter 792/1984 - loss 0.02504839 - time (sec): 48.64 - samples/sec: 1324.60 - lr: 0.000012 - momentum: 0.000000
433
+ 2023-10-24 17:09:14,867 epoch 7 - iter 990/1984 - loss 0.02489200 - time (sec): 60.71 - samples/sec: 1323.82 - lr: 0.000012 - momentum: 0.000000
434
+ 2023-10-24 17:09:27,252 epoch 7 - iter 1188/1984 - loss 0.02447758 - time (sec): 73.10 - samples/sec: 1338.12 - lr: 0.000011 - momentum: 0.000000
435
+ 2023-10-24 17:09:39,473 epoch 7 - iter 1386/1984 - loss 0.02467051 - time (sec): 85.32 - samples/sec: 1344.90 - lr: 0.000011 - momentum: 0.000000
436
+ 2023-10-24 17:09:51,517 epoch 7 - iter 1584/1984 - loss 0.02483831 - time (sec): 97.36 - samples/sec: 1345.72 - lr: 0.000011 - momentum: 0.000000
437
+ 2023-10-24 17:10:03,564 epoch 7 - iter 1782/1984 - loss 0.02499026 - time (sec): 109.41 - samples/sec: 1346.91 - lr: 0.000010 - momentum: 0.000000
438
+ 2023-10-24 17:10:15,630 epoch 7 - iter 1980/1984 - loss 0.02535832 - time (sec): 121.47 - samples/sec: 1346.06 - lr: 0.000010 - momentum: 0.000000
439
+ 2023-10-24 17:10:15,884 ----------------------------------------------------------------------------------------------------
440
+ 2023-10-24 17:10:15,884 EPOCH 7 done: loss 0.0253 - lr: 0.000010
441
+ 2023-10-24 17:10:19,005 DEV : loss 0.2231946587562561 - f1-score (micro avg) 0.7507
442
+ 2023-10-24 17:10:19,021 ----------------------------------------------------------------------------------------------------
443
+ 2023-10-24 17:10:31,676 epoch 8 - iter 198/1984 - loss 0.01362913 - time (sec): 12.65 - samples/sec: 1371.16 - lr: 0.000010 - momentum: 0.000000
444
+ 2023-10-24 17:10:43,986 epoch 8 - iter 396/1984 - loss 0.01453365 - time (sec): 24.96 - samples/sec: 1368.42 - lr: 0.000009 - momentum: 0.000000
445
+ 2023-10-24 17:10:55,962 epoch 8 - iter 594/1984 - loss 0.01371635 - time (sec): 36.94 - samples/sec: 1345.54 - lr: 0.000009 - momentum: 0.000000
446
+ 2023-10-24 17:11:08,188 epoch 8 - iter 792/1984 - loss 0.01363124 - time (sec): 49.17 - samples/sec: 1338.60 - lr: 0.000009 - momentum: 0.000000
447
+ 2023-10-24 17:11:20,225 epoch 8 - iter 990/1984 - loss 0.01455714 - time (sec): 61.20 - samples/sec: 1334.80 - lr: 0.000008 - momentum: 0.000000
448
+ 2023-10-24 17:11:32,426 epoch 8 - iter 1188/1984 - loss 0.01600626 - time (sec): 73.40 - samples/sec: 1346.37 - lr: 0.000008 - momentum: 0.000000
449
+ 2023-10-24 17:11:44,542 epoch 8 - iter 1386/1984 - loss 0.01607764 - time (sec): 85.52 - samples/sec: 1348.15 - lr: 0.000008 - momentum: 0.000000
450
+ 2023-10-24 17:11:56,395 epoch 8 - iter 1584/1984 - loss 0.01596874 - time (sec): 97.37 - samples/sec: 1339.42 - lr: 0.000007 - momentum: 0.000000
451
+ 2023-10-24 17:12:08,622 epoch 8 - iter 1782/1984 - loss 0.01600345 - time (sec): 109.60 - samples/sec: 1340.49 - lr: 0.000007 - momentum: 0.000000
452
+ 2023-10-24 17:12:20,768 epoch 8 - iter 1980/1984 - loss 0.01670679 - time (sec): 121.75 - samples/sec: 1344.01 - lr: 0.000007 - momentum: 0.000000
453
+ 2023-10-24 17:12:21,005 ----------------------------------------------------------------------------------------------------
454
+ 2023-10-24 17:12:21,005 EPOCH 8 done: loss 0.0167 - lr: 0.000007
455
+ 2023-10-24 17:12:24,126 DEV : loss 0.2260325700044632 - f1-score (micro avg) 0.7547
456
+ 2023-10-24 17:12:24,142 saving best model
457
+ 2023-10-24 17:12:24,734 ----------------------------------------------------------------------------------------------------
458
+ 2023-10-24 17:12:36,802 epoch 9 - iter 198/1984 - loss 0.01167945 - time (sec): 12.07 - samples/sec: 1314.60 - lr: 0.000006 - momentum: 0.000000
459
+ 2023-10-24 17:12:48,856 epoch 9 - iter 396/1984 - loss 0.01028989 - time (sec): 24.12 - samples/sec: 1310.79 - lr: 0.000006 - momentum: 0.000000
460
+ 2023-10-24 17:13:00,869 epoch 9 - iter 594/1984 - loss 0.01244219 - time (sec): 36.13 - samples/sec: 1307.66 - lr: 0.000006 - momentum: 0.000000
461
+ 2023-10-24 17:13:13,465 epoch 9 - iter 792/1984 - loss 0.01152707 - time (sec): 48.73 - samples/sec: 1326.87 - lr: 0.000005 - momentum: 0.000000
462
+ 2023-10-24 17:13:25,772 epoch 9 - iter 990/1984 - loss 0.01079042 - time (sec): 61.04 - samples/sec: 1339.77 - lr: 0.000005 - momentum: 0.000000
463
+ 2023-10-24 17:13:38,011 epoch 9 - iter 1188/1984 - loss 0.01084398 - time (sec): 73.28 - samples/sec: 1342.98 - lr: 0.000005 - momentum: 0.000000
464
+ 2023-10-24 17:13:50,001 epoch 9 - iter 1386/1984 - loss 0.01095687 - time (sec): 85.27 - samples/sec: 1342.23 - lr: 0.000004 - momentum: 0.000000
465
+ 2023-10-24 17:14:02,064 epoch 9 - iter 1584/1984 - loss 0.01043359 - time (sec): 97.33 - samples/sec: 1342.13 - lr: 0.000004 - momentum: 0.000000
466
+ 2023-10-24 17:14:14,071 epoch 9 - iter 1782/1984 - loss 0.01070738 - time (sec): 109.34 - samples/sec: 1343.09 - lr: 0.000004 - momentum: 0.000000
467
+ 2023-10-24 17:14:26,147 epoch 9 - iter 1980/1984 - loss 0.01099379 - time (sec): 121.41 - samples/sec: 1348.31 - lr: 0.000003 - momentum: 0.000000
468
+ 2023-10-24 17:14:26,381 ----------------------------------------------------------------------------------------------------
469
+ 2023-10-24 17:14:26,381 EPOCH 9 done: loss 0.0110 - lr: 0.000003
470
+ 2023-10-24 17:14:29,825 DEV : loss 0.24125918745994568 - f1-score (micro avg) 0.7639
471
+ 2023-10-24 17:14:29,841 saving best model
472
+ 2023-10-24 17:14:30,454 ----------------------------------------------------------------------------------------------------
473
+ 2023-10-24 17:14:42,411 epoch 10 - iter 198/1984 - loss 0.00345145 - time (sec): 11.96 - samples/sec: 1355.66 - lr: 0.000003 - momentum: 0.000000
474
+ 2023-10-24 17:14:54,503 epoch 10 - iter 396/1984 - loss 0.00398821 - time (sec): 24.05 - samples/sec: 1344.86 - lr: 0.000003 - momentum: 0.000000
475
+ 2023-10-24 17:15:06,668 epoch 10 - iter 594/1984 - loss 0.00606419 - time (sec): 36.21 - samples/sec: 1355.76 - lr: 0.000002 - momentum: 0.000000
476
+ 2023-10-24 17:15:18,803 epoch 10 - iter 792/1984 - loss 0.00655537 - time (sec): 48.35 - samples/sec: 1368.69 - lr: 0.000002 - momentum: 0.000000
477
+ 2023-10-24 17:15:30,830 epoch 10 - iter 990/1984 - loss 0.00666700 - time (sec): 60.38 - samples/sec: 1363.80 - lr: 0.000002 - momentum: 0.000000
478
+ 2023-10-24 17:15:42,958 epoch 10 - iter 1188/1984 - loss 0.00648338 - time (sec): 72.50 - samples/sec: 1355.06 - lr: 0.000001 - momentum: 0.000000
479
+ 2023-10-24 17:15:55,093 epoch 10 - iter 1386/1984 - loss 0.00695280 - time (sec): 84.64 - samples/sec: 1354.00 - lr: 0.000001 - momentum: 0.000000
480
+ 2023-10-24 17:16:07,044 epoch 10 - iter 1584/1984 - loss 0.00694583 - time (sec): 96.59 - samples/sec: 1348.02 - lr: 0.000001 - momentum: 0.000000
481
+ 2023-10-24 17:16:19,281 epoch 10 - iter 1782/1984 - loss 0.00731465 - time (sec): 108.83 - samples/sec: 1348.88 - lr: 0.000000 - momentum: 0.000000
482
+ 2023-10-24 17:16:31,552 epoch 10 - iter 1980/1984 - loss 0.00719445 - time (sec): 121.10 - samples/sec: 1351.26 - lr: 0.000000 - momentum: 0.000000
483
+ 2023-10-24 17:16:31,799 ----------------------------------------------------------------------------------------------------
484
+ 2023-10-24 17:16:31,799 EPOCH 10 done: loss 0.0072 - lr: 0.000000
485
+ 2023-10-24 17:16:34,920 DEV : loss 0.24395139515399933 - f1-score (micro avg) 0.7747
486
+ 2023-10-24 17:16:34,936 saving best model
487
+ 2023-10-24 17:16:35,995 ----------------------------------------------------------------------------------------------------
488
+ 2023-10-24 17:16:35,996 Loading model from best epoch ...
489
+ 2023-10-24 17:16:37,468 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG
490
+ 2023-10-24 17:16:40,539
491
+ Results:
492
+ - F-score (micro) 0.7847
493
+ - F-score (macro) 0.7007
494
+ - Accuracy 0.6667
495
+
496
+ By class:
497
+ precision recall f1-score support
498
+
499
+ LOC 0.8338 0.8580 0.8457 655
500
+ PER 0.6923 0.8072 0.7453 223
501
+ ORG 0.5800 0.4567 0.5110 127
502
+
503
+ micro avg 0.7737 0.7960 0.7847 1005
504
+ macro avg 0.7020 0.7073 0.7007 1005
505
+ weighted avg 0.7704 0.7960 0.7812 1005
506
+
507
+ 2023-10-24 17:16:40,539 ----------------------------------------------------------------------------------------------------