pjox commited on
Commit
7ca3dda
1 Parent(s): 8e4495d

Uploaded the model

Browse files
Files changed (6) hide show
  1. dev.tsv +0 -0
  2. final-model.pt +3 -0
  3. loss.tsv +11 -0
  4. test.tsv +0 -0
  5. training.log +499 -0
  6. weights.txt +0 -0
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:540573dc73c98f5b1fa42c789f4465e7f1b2c7f326d7461dfdeada7d2522644b
3
+ size 444998061
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 02:16:47 4 0.0001 0.2182895945059601 0.0355144739151001 0.7661 0.8947 0.8254 0.722
3
+ 2 03:26:13 4 0.0000 0.13729935181138425 0.015243684872984886 0.9007 0.926 0.9132 0.8548
4
+ 3 04:35:09 4 0.0000 0.11197359439314927 0.016585879027843475 0.9119 0.9342 0.9229 0.8697
5
+ 4 05:44:00 4 0.0000 0.09147635538963178 0.016923826187849045 0.9132 0.9296 0.9213 0.8708
6
+ 5 06:52:15 4 0.0000 0.07495889990317275 0.017464155331254005 0.9377 0.9246 0.9311 0.8831
7
+ 6 08:00:45 4 0.0000 0.061747689342078395 0.01982131227850914 0.9348 0.9369 0.9358 0.8909
8
+ 7 09:09:20 4 0.0000 0.0519030773124998 0.02467426098883152 0.9395 0.9315 0.9355 0.892
9
+ 8 10:18:00 4 0.0000 0.04503195115695853 0.02364770695567131 0.9306 0.9438 0.9371 0.8939
10
+ 9 11:26:37 4 0.0000 0.040509963133028556 0.026182951405644417 0.9328 0.9394 0.9361 0.8942
11
+ 10 12:34:45 4 0.0000 0.03798332489249556 0.027400659397244453 0.9349 0.9388 0.9368 0.8943
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,499 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2022-02-05 01:08:47,419 ----------------------------------------------------------------------------------------------------
2
+ 2022-02-05 01:08:47,461 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): RobertaModel(
5
+ (embeddings): RobertaEmbeddings(
6
+ (word_embeddings): Embedding(32768, 768, padding_idx=1)
7
+ (position_embeddings): Embedding(514, 768, padding_idx=1)
8
+ (token_type_embeddings): Embedding(1, 768)
9
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): RobertaEncoder(
13
+ (layer): ModuleList(
14
+ (0): RobertaLayer(
15
+ (attention): RobertaAttention(
16
+ (self): RobertaSelfAttention(
17
+ (query): Linear(in_features=768, out_features=768, bias=True)
18
+ (key): Linear(in_features=768, out_features=768, bias=True)
19
+ (value): Linear(in_features=768, out_features=768, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): RobertaSelfOutput(
23
+ (dense): Linear(in_features=768, out_features=768, bias=True)
24
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): RobertaIntermediate(
29
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
30
+ )
31
+ (output): RobertaOutput(
32
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
33
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
34
+ (dropout): Dropout(p=0.1, inplace=False)
35
+ )
36
+ )
37
+ (1): RobertaLayer(
38
+ (attention): RobertaAttention(
39
+ (self): RobertaSelfAttention(
40
+ (query): Linear(in_features=768, out_features=768, bias=True)
41
+ (key): Linear(in_features=768, out_features=768, bias=True)
42
+ (value): Linear(in_features=768, out_features=768, bias=True)
43
+ (dropout): Dropout(p=0.1, inplace=False)
44
+ )
45
+ (output): RobertaSelfOutput(
46
+ (dense): Linear(in_features=768, out_features=768, bias=True)
47
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
48
+ (dropout): Dropout(p=0.1, inplace=False)
49
+ )
50
+ )
51
+ (intermediate): RobertaIntermediate(
52
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
53
+ )
54
+ (output): RobertaOutput(
55
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
56
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
57
+ (dropout): Dropout(p=0.1, inplace=False)
58
+ )
59
+ )
60
+ (2): RobertaLayer(
61
+ (attention): RobertaAttention(
62
+ (self): RobertaSelfAttention(
63
+ (query): Linear(in_features=768, out_features=768, bias=True)
64
+ (key): Linear(in_features=768, out_features=768, bias=True)
65
+ (value): Linear(in_features=768, out_features=768, bias=True)
66
+ (dropout): Dropout(p=0.1, inplace=False)
67
+ )
68
+ (output): RobertaSelfOutput(
69
+ (dense): Linear(in_features=768, out_features=768, bias=True)
70
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
71
+ (dropout): Dropout(p=0.1, inplace=False)
72
+ )
73
+ )
74
+ (intermediate): RobertaIntermediate(
75
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
76
+ )
77
+ (output): RobertaOutput(
78
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
79
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
80
+ (dropout): Dropout(p=0.1, inplace=False)
81
+ )
82
+ )
83
+ (3): RobertaLayer(
84
+ (attention): RobertaAttention(
85
+ (self): RobertaSelfAttention(
86
+ (query): Linear(in_features=768, out_features=768, bias=True)
87
+ (key): Linear(in_features=768, out_features=768, bias=True)
88
+ (value): Linear(in_features=768, out_features=768, bias=True)
89
+ (dropout): Dropout(p=0.1, inplace=False)
90
+ )
91
+ (output): RobertaSelfOutput(
92
+ (dense): Linear(in_features=768, out_features=768, bias=True)
93
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
94
+ (dropout): Dropout(p=0.1, inplace=False)
95
+ )
96
+ )
97
+ (intermediate): RobertaIntermediate(
98
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
99
+ )
100
+ (output): RobertaOutput(
101
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
102
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
103
+ (dropout): Dropout(p=0.1, inplace=False)
104
+ )
105
+ )
106
+ (4): RobertaLayer(
107
+ (attention): RobertaAttention(
108
+ (self): RobertaSelfAttention(
109
+ (query): Linear(in_features=768, out_features=768, bias=True)
110
+ (key): Linear(in_features=768, out_features=768, bias=True)
111
+ (value): Linear(in_features=768, out_features=768, bias=True)
112
+ (dropout): Dropout(p=0.1, inplace=False)
113
+ )
114
+ (output): RobertaSelfOutput(
115
+ (dense): Linear(in_features=768, out_features=768, bias=True)
116
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
117
+ (dropout): Dropout(p=0.1, inplace=False)
118
+ )
119
+ )
120
+ (intermediate): RobertaIntermediate(
121
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
122
+ )
123
+ (output): RobertaOutput(
124
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
125
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
126
+ (dropout): Dropout(p=0.1, inplace=False)
127
+ )
128
+ )
129
+ (5): RobertaLayer(
130
+ (attention): RobertaAttention(
131
+ (self): RobertaSelfAttention(
132
+ (query): Linear(in_features=768, out_features=768, bias=True)
133
+ (key): Linear(in_features=768, out_features=768, bias=True)
134
+ (value): Linear(in_features=768, out_features=768, bias=True)
135
+ (dropout): Dropout(p=0.1, inplace=False)
136
+ )
137
+ (output): RobertaSelfOutput(
138
+ (dense): Linear(in_features=768, out_features=768, bias=True)
139
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
140
+ (dropout): Dropout(p=0.1, inplace=False)
141
+ )
142
+ )
143
+ (intermediate): RobertaIntermediate(
144
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
145
+ )
146
+ (output): RobertaOutput(
147
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
148
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
149
+ (dropout): Dropout(p=0.1, inplace=False)
150
+ )
151
+ )
152
+ (6): RobertaLayer(
153
+ (attention): RobertaAttention(
154
+ (self): RobertaSelfAttention(
155
+ (query): Linear(in_features=768, out_features=768, bias=True)
156
+ (key): Linear(in_features=768, out_features=768, bias=True)
157
+ (value): Linear(in_features=768, out_features=768, bias=True)
158
+ (dropout): Dropout(p=0.1, inplace=False)
159
+ )
160
+ (output): RobertaSelfOutput(
161
+ (dense): Linear(in_features=768, out_features=768, bias=True)
162
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
163
+ (dropout): Dropout(p=0.1, inplace=False)
164
+ )
165
+ )
166
+ (intermediate): RobertaIntermediate(
167
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
168
+ )
169
+ (output): RobertaOutput(
170
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
171
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
172
+ (dropout): Dropout(p=0.1, inplace=False)
173
+ )
174
+ )
175
+ (7): RobertaLayer(
176
+ (attention): RobertaAttention(
177
+ (self): RobertaSelfAttention(
178
+ (query): Linear(in_features=768, out_features=768, bias=True)
179
+ (key): Linear(in_features=768, out_features=768, bias=True)
180
+ (value): Linear(in_features=768, out_features=768, bias=True)
181
+ (dropout): Dropout(p=0.1, inplace=False)
182
+ )
183
+ (output): RobertaSelfOutput(
184
+ (dense): Linear(in_features=768, out_features=768, bias=True)
185
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
186
+ (dropout): Dropout(p=0.1, inplace=False)
187
+ )
188
+ )
189
+ (intermediate): RobertaIntermediate(
190
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
191
+ )
192
+ (output): RobertaOutput(
193
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
194
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
195
+ (dropout): Dropout(p=0.1, inplace=False)
196
+ )
197
+ )
198
+ (8): RobertaLayer(
199
+ (attention): RobertaAttention(
200
+ (self): RobertaSelfAttention(
201
+ (query): Linear(in_features=768, out_features=768, bias=True)
202
+ (key): Linear(in_features=768, out_features=768, bias=True)
203
+ (value): Linear(in_features=768, out_features=768, bias=True)
204
+ (dropout): Dropout(p=0.1, inplace=False)
205
+ )
206
+ (output): RobertaSelfOutput(
207
+ (dense): Linear(in_features=768, out_features=768, bias=True)
208
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
209
+ (dropout): Dropout(p=0.1, inplace=False)
210
+ )
211
+ )
212
+ (intermediate): RobertaIntermediate(
213
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
214
+ )
215
+ (output): RobertaOutput(
216
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
217
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
218
+ (dropout): Dropout(p=0.1, inplace=False)
219
+ )
220
+ )
221
+ (9): RobertaLayer(
222
+ (attention): RobertaAttention(
223
+ (self): RobertaSelfAttention(
224
+ (query): Linear(in_features=768, out_features=768, bias=True)
225
+ (key): Linear(in_features=768, out_features=768, bias=True)
226
+ (value): Linear(in_features=768, out_features=768, bias=True)
227
+ (dropout): Dropout(p=0.1, inplace=False)
228
+ )
229
+ (output): RobertaSelfOutput(
230
+ (dense): Linear(in_features=768, out_features=768, bias=True)
231
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
232
+ (dropout): Dropout(p=0.1, inplace=False)
233
+ )
234
+ )
235
+ (intermediate): RobertaIntermediate(
236
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
237
+ )
238
+ (output): RobertaOutput(
239
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
240
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
241
+ (dropout): Dropout(p=0.1, inplace=False)
242
+ )
243
+ )
244
+ (10): RobertaLayer(
245
+ (attention): RobertaAttention(
246
+ (self): RobertaSelfAttention(
247
+ (query): Linear(in_features=768, out_features=768, bias=True)
248
+ (key): Linear(in_features=768, out_features=768, bias=True)
249
+ (value): Linear(in_features=768, out_features=768, bias=True)
250
+ (dropout): Dropout(p=0.1, inplace=False)
251
+ )
252
+ (output): RobertaSelfOutput(
253
+ (dense): Linear(in_features=768, out_features=768, bias=True)
254
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
255
+ (dropout): Dropout(p=0.1, inplace=False)
256
+ )
257
+ )
258
+ (intermediate): RobertaIntermediate(
259
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
260
+ )
261
+ (output): RobertaOutput(
262
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
263
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
264
+ (dropout): Dropout(p=0.1, inplace=False)
265
+ )
266
+ )
267
+ (11): RobertaLayer(
268
+ (attention): RobertaAttention(
269
+ (self): RobertaSelfAttention(
270
+ (query): Linear(in_features=768, out_features=768, bias=True)
271
+ (key): Linear(in_features=768, out_features=768, bias=True)
272
+ (value): Linear(in_features=768, out_features=768, bias=True)
273
+ (dropout): Dropout(p=0.1, inplace=False)
274
+ )
275
+ (output): RobertaSelfOutput(
276
+ (dense): Linear(in_features=768, out_features=768, bias=True)
277
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
278
+ (dropout): Dropout(p=0.1, inplace=False)
279
+ )
280
+ )
281
+ (intermediate): RobertaIntermediate(
282
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
283
+ )
284
+ (output): RobertaOutput(
285
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
286
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
287
+ (dropout): Dropout(p=0.1, inplace=False)
288
+ )
289
+ )
290
+ )
291
+ )
292
+ (pooler): RobertaPooler(
293
+ (dense): Linear(in_features=768, out_features=768, bias=True)
294
+ (activation): Tanh()
295
+ )
296
+ )
297
+ )
298
+ (word_dropout): WordDropout(p=0.05)
299
+ (locked_dropout): LockedDropout(p=0.5)
300
+ (linear): Linear(in_features=768, out_features=18, bias=True)
301
+ (beta): 1.0
302
+ (weights): None
303
+ (weight_tensor) None
304
+ )"
305
+ 2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
306
+ 2022-02-05 01:08:47,466 Corpus: "Corpus: 126973 train + 7037 dev + 7090 test sentences"
307
+ 2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
308
+ 2022-02-05 01:08:47,466 Parameters:
309
+ 2022-02-05 01:08:47,466 - learning_rate: "5e-05"
310
+ 2022-02-05 01:08:47,466 - mini_batch_size: "16"
311
+ 2022-02-05 01:08:47,466 - patience: "3"
312
+ 2022-02-05 01:08:47,466 - anneal_factor: "0.5"
313
+ 2022-02-05 01:08:47,466 - max_epochs: "10"
314
+ 2022-02-05 01:08:47,466 - shuffle: "True"
315
+ 2022-02-05 01:08:47,466 - train_with_dev: "False"
316
+ 2022-02-05 01:08:47,466 - batch_growth_annealing: "False"
317
+ 2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
318
+ 2022-02-05 01:08:47,466 Model training base path: "resources/taggers/ner-dalembert-2ndtry"
319
+ 2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
320
+ 2022-02-05 01:08:47,466 Device: cuda:0
321
+ 2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
322
+ 2022-02-05 01:08:47,467 Embeddings storage mode: none
323
+ 2022-02-05 01:08:47,469 ----------------------------------------------------------------------------------------------------
324
+ 2022-02-05 01:15:08,771 epoch 1 - iter 793/7936 - loss 0.78007372 - samples/sec: 33.28 - lr: 0.000005
325
+ 2022-02-05 01:22:45,940 epoch 1 - iter 1586/7936 - loss 0.41932043 - samples/sec: 27.76 - lr: 0.000010
326
+ 2022-02-05 01:29:23,897 epoch 1 - iter 2379/7936 - loss 0.33514542 - samples/sec: 31.89 - lr: 0.000015
327
+ 2022-02-05 01:35:24,915 epoch 1 - iter 3172/7936 - loss 0.30212998 - samples/sec: 35.15 - lr: 0.000020
328
+ 2022-02-05 01:42:28,297 epoch 1 - iter 3965/7936 - loss 0.27341208 - samples/sec: 29.97 - lr: 0.000025
329
+ 2022-02-05 01:49:23,543 epoch 1 - iter 4758/7936 - loss 0.25403588 - samples/sec: 30.56 - lr: 0.000030
330
+ 2022-02-05 01:55:46,783 epoch 1 - iter 5551/7936 - loss 0.24241496 - samples/sec: 33.11 - lr: 0.000035
331
+ 2022-02-05 02:01:45,654 epoch 1 - iter 6344/7936 - loss 0.23381719 - samples/sec: 35.36 - lr: 0.000040
332
+ 2022-02-05 02:07:29,407 epoch 1 - iter 7137/7936 - loss 0.22586308 - samples/sec: 36.92 - lr: 0.000045
333
+ 2022-02-05 02:13:54,603 epoch 1 - iter 7930/7936 - loss 0.21834611 - samples/sec: 32.94 - lr: 0.000050
334
+ 2022-02-05 02:13:57,692 ----------------------------------------------------------------------------------------------------
335
+ 2022-02-05 02:13:57,693 EPOCH 1 done: loss 0.2183 - lr 0.0000500
336
+ 2022-02-05 02:16:47,190 DEV : loss 0.0355144739151001 - f1-score (micro avg) 0.8254
337
+ 2022-02-05 02:16:47,244 BAD EPOCHS (no improvement): 4
338
+ 2022-02-05 02:16:47,244 ----------------------------------------------------------------------------------------------------
339
+ 2022-02-05 02:23:15,435 epoch 2 - iter 793/7936 - loss 0.14903310 - samples/sec: 32.69 - lr: 0.000049
340
+ 2022-02-05 02:30:06,605 epoch 2 - iter 1586/7936 - loss 0.14777394 - samples/sec: 30.86 - lr: 0.000049
341
+ 2022-02-05 02:36:48,570 epoch 2 - iter 2379/7936 - loss 0.14637300 - samples/sec: 31.57 - lr: 0.000048
342
+ 2022-02-05 02:43:37,172 epoch 2 - iter 3172/7936 - loss 0.14491485 - samples/sec: 31.06 - lr: 0.000048
343
+ 2022-02-05 02:50:13,040 epoch 2 - iter 3965/7936 - loss 0.14361996 - samples/sec: 32.06 - lr: 0.000047
344
+ 2022-02-05 02:56:49,904 epoch 2 - iter 4758/7936 - loss 0.14232123 - samples/sec: 31.98 - lr: 0.000047
345
+ 2022-02-05 03:03:34,383 epoch 2 - iter 5551/7936 - loss 0.14116820 - samples/sec: 31.38 - lr: 0.000046
346
+ 2022-02-05 03:10:09,778 epoch 2 - iter 6344/7936 - loss 0.14001072 - samples/sec: 32.10 - lr: 0.000046
347
+ 2022-02-05 03:16:43,847 epoch 2 - iter 7137/7936 - loss 0.13868572 - samples/sec: 32.20 - lr: 0.000045
348
+ 2022-02-05 03:23:28,994 epoch 2 - iter 7930/7936 - loss 0.13731517 - samples/sec: 31.33 - lr: 0.000044
349
+ 2022-02-05 03:23:31,622 ----------------------------------------------------------------------------------------------------
350
+ 2022-02-05 03:23:31,623 EPOCH 2 done: loss 0.1373 - lr 0.0000444
351
+ 2022-02-05 03:26:13,727 DEV : loss 0.015243684872984886 - f1-score (micro avg) 0.9132
352
+ 2022-02-05 03:26:13,788 BAD EPOCHS (no improvement): 4
353
+ 2022-02-05 03:26:13,806 ----------------------------------------------------------------------------------------------------
354
+ 2022-02-05 03:32:57,765 epoch 3 - iter 793/7936 - loss 0.11924788 - samples/sec: 31.42 - lr: 0.000044
355
+ 2022-02-05 03:39:33,229 epoch 3 - iter 1586/7936 - loss 0.11867811 - samples/sec: 32.09 - lr: 0.000043
356
+ 2022-02-05 03:46:09,619 epoch 3 - iter 2379/7936 - loss 0.11819415 - samples/sec: 32.01 - lr: 0.000043
357
+ 2022-02-05 03:52:49,510 epoch 3 - iter 3172/7936 - loss 0.11779082 - samples/sec: 31.74 - lr: 0.000042
358
+ 2022-02-05 03:59:27,917 epoch 3 - iter 3965/7936 - loss 0.11691604 - samples/sec: 31.85 - lr: 0.000042
359
+ 2022-02-05 04:06:01,365 epoch 3 - iter 4758/7936 - loss 0.11592267 - samples/sec: 32.26 - lr: 0.000041
360
+ 2022-02-05 04:12:41,174 epoch 3 - iter 5551/7936 - loss 0.11480043 - samples/sec: 31.74 - lr: 0.000041
361
+ 2022-02-05 04:19:14,243 epoch 3 - iter 6344/7936 - loss 0.11389582 - samples/sec: 32.29 - lr: 0.000040
362
+ 2022-02-05 04:25:45,192 epoch 3 - iter 7137/7936 - loss 0.11289267 - samples/sec: 32.46 - lr: 0.000039
363
+ 2022-02-05 04:32:26,310 epoch 3 - iter 7930/7936 - loss 0.11196899 - samples/sec: 31.64 - lr: 0.000039
364
+ 2022-02-05 04:32:29,352 ----------------------------------------------------------------------------------------------------
365
+ 2022-02-05 04:32:29,353 EPOCH 3 done: loss 0.1120 - lr 0.0000389
366
+ 2022-02-05 04:35:09,639 DEV : loss 0.016585879027843475 - f1-score (micro avg) 0.9229
367
+ 2022-02-05 04:35:09,698 BAD EPOCHS (no improvement): 4
368
+ 2022-02-05 04:35:09,698 ----------------------------------------------------------------------------------------------------
369
+ 2022-02-05 04:41:46,821 epoch 4 - iter 793/7936 - loss 0.09739851 - samples/sec: 31.96 - lr: 0.000038
370
+ 2022-02-05 04:48:23,504 epoch 4 - iter 1586/7936 - loss 0.09750632 - samples/sec: 31.99 - lr: 0.000038
371
+ 2022-02-05 04:55:05,833 epoch 4 - iter 2379/7936 - loss 0.09636659 - samples/sec: 31.54 - lr: 0.000037
372
+ 2022-02-05 05:01:34,951 epoch 4 - iter 3172/7936 - loss 0.09583742 - samples/sec: 32.61 - lr: 0.000037
373
+ 2022-02-05 05:08:07,163 epoch 4 - iter 3965/7936 - loss 0.09518243 - samples/sec: 32.36 - lr: 0.000036
374
+ 2022-02-05 05:14:50,781 epoch 4 - iter 4758/7936 - loss 0.09444265 - samples/sec: 31.44 - lr: 0.000036
375
+ 2022-02-05 05:21:24,983 epoch 4 - iter 5551/7936 - loss 0.09374740 - samples/sec: 32.19 - lr: 0.000035
376
+ 2022-02-05 05:27:54,052 epoch 4 - iter 6344/7936 - loss 0.09321236 - samples/sec: 32.62 - lr: 0.000034
377
+ 2022-02-05 05:34:32,228 epoch 4 - iter 7137/7936 - loss 0.09231997 - samples/sec: 31.87 - lr: 0.000034
378
+ 2022-02-05 05:41:08,580 epoch 4 - iter 7930/7936 - loss 0.09147929 - samples/sec: 32.02 - lr: 0.000033
379
+ 2022-02-05 05:41:11,479 ----------------------------------------------------------------------------------------------------
380
+ 2022-02-05 05:41:11,479 EPOCH 4 done: loss 0.0915 - lr 0.0000333
381
+ 2022-02-05 05:44:00,197 DEV : loss 0.016923826187849045 - f1-score (micro avg) 0.9213
382
+ 2022-02-05 05:44:00,256 BAD EPOCHS (no improvement): 4
383
+ 2022-02-05 05:44:00,270 ----------------------------------------------------------------------------------------------------
384
+ 2022-02-05 05:50:27,537 epoch 5 - iter 793/7936 - loss 0.07986125 - samples/sec: 32.77 - lr: 0.000033
385
+ 2022-02-05 05:56:56,203 epoch 5 - iter 1586/7936 - loss 0.08031745 - samples/sec: 32.65 - lr: 0.000032
386
+ 2022-02-05 06:03:34,109 epoch 5 - iter 2379/7936 - loss 0.07984185 - samples/sec: 31.89 - lr: 0.000032
387
+ 2022-02-05 06:10:03,550 epoch 5 - iter 3172/7936 - loss 0.07905074 - samples/sec: 32.59 - lr: 0.000031
388
+ 2022-02-05 06:16:30,085 epoch 5 - iter 3965/7936 - loss 0.07843193 - samples/sec: 32.83 - lr: 0.000031
389
+ 2022-02-05 06:23:10,671 epoch 5 - iter 4758/7936 - loss 0.07785540 - samples/sec: 31.68 - lr: 0.000030
390
+ 2022-02-05 06:29:45,063 epoch 5 - iter 5551/7936 - loss 0.07709413 - samples/sec: 32.18 - lr: 0.000029
391
+ 2022-02-05 06:36:23,513 epoch 5 - iter 6344/7936 - loss 0.07634510 - samples/sec: 31.85 - lr: 0.000029
392
+ 2022-02-05 06:42:51,615 epoch 5 - iter 7137/7936 - loss 0.07566508 - samples/sec: 32.70 - lr: 0.000028
393
+ 2022-02-05 06:49:23,409 epoch 5 - iter 7930/7936 - loss 0.07495508 - samples/sec: 32.39 - lr: 0.000028
394
+ 2022-02-05 06:49:26,372 ----------------------------------------------------------------------------------------------------
395
+ 2022-02-05 06:49:26,373 EPOCH 5 done: loss 0.0750 - lr 0.0000278
396
+ 2022-02-05 06:52:15,459 DEV : loss 0.017464155331254005 - f1-score (micro avg) 0.9311
397
+ 2022-02-05 06:52:15,518 BAD EPOCHS (no improvement): 4
398
+ 2022-02-05 06:52:15,518 ----------------------------------------------------------------------------------------------------
399
+ 2022-02-05 06:58:49,072 epoch 6 - iter 793/7936 - loss 0.06552824 - samples/sec: 32.25 - lr: 0.000027
400
+ 2022-02-05 07:05:27,796 epoch 6 - iter 1586/7936 - loss 0.06569517 - samples/sec: 31.83 - lr: 0.000027
401
+ 2022-02-05 07:11:58,162 epoch 6 - iter 2379/7936 - loss 0.06536467 - samples/sec: 32.51 - lr: 0.000026
402
+ 2022-02-05 07:18:25,878 epoch 6 - iter 3172/7936 - loss 0.06467146 - samples/sec: 32.73 - lr: 0.000026
403
+ 2022-02-05 07:25:10,562 epoch 6 - iter 3965/7936 - loss 0.06426965 - samples/sec: 31.36 - lr: 0.000025
404
+ 2022-02-05 07:31:39,437 epoch 6 - iter 4758/7936 - loss 0.06371305 - samples/sec: 32.63 - lr: 0.000024
405
+ 2022-02-05 07:38:08,323 epoch 6 - iter 5551/7936 - loss 0.06328229 - samples/sec: 32.63 - lr: 0.000024
406
+ 2022-02-05 07:44:52,176 epoch 6 - iter 6344/7936 - loss 0.06272143 - samples/sec: 31.42 - lr: 0.000023
407
+ 2022-02-05 07:51:20,507 epoch 6 - iter 7137/7936 - loss 0.06218937 - samples/sec: 32.68 - lr: 0.000023
408
+ 2022-02-05 07:57:52,828 epoch 6 - iter 7930/7936 - loss 0.06175113 - samples/sec: 32.35 - lr: 0.000022
409
+ 2022-02-05 07:57:55,686 ----------------------------------------------------------------------------------------------------
410
+ 2022-02-05 07:57:55,687 EPOCH 6 done: loss 0.0617 - lr 0.0000222
411
+ 2022-02-05 08:00:45,565 DEV : loss 0.01982131227850914 - f1-score (micro avg) 0.9358
412
+ 2022-02-05 08:00:45,625 BAD EPOCHS (no improvement): 4
413
+ 2022-02-05 08:00:45,644 ----------------------------------------------------------------------------------------------------
414
+ 2022-02-05 08:07:26,967 epoch 7 - iter 793/7936 - loss 0.05520420 - samples/sec: 31.62 - lr: 0.000022
415
+ 2022-02-05 08:13:58,782 epoch 7 - iter 1586/7936 - loss 0.05522964 - samples/sec: 32.39 - lr: 0.000021
416
+ 2022-02-05 08:20:32,705 epoch 7 - iter 2379/7936 - loss 0.05482898 - samples/sec: 32.21 - lr: 0.000021
417
+ 2022-02-05 08:27:14,353 epoch 7 - iter 3172/7936 - loss 0.05433105 - samples/sec: 31.59 - lr: 0.000020
418
+ 2022-02-05 08:33:45,236 epoch 7 - iter 3965/7936 - loss 0.05397125 - samples/sec: 32.47 - lr: 0.000019
419
+ 2022-02-05 08:40:14,072 epoch 7 - iter 4758/7936 - loss 0.05348281 - samples/sec: 32.64 - lr: 0.000019
420
+ 2022-02-05 08:46:52,674 epoch 7 - iter 5551/7936 - loss 0.05316673 - samples/sec: 31.84 - lr: 0.000018
421
+ 2022-02-05 08:53:20,653 epoch 7 - iter 6344/7936 - loss 0.05275831 - samples/sec: 32.71 - lr: 0.000018
422
+ 2022-02-05 08:59:52,741 epoch 7 - iter 7137/7936 - loss 0.05230036 - samples/sec: 32.37 - lr: 0.000017
423
+ 2022-02-05 09:06:38,983 epoch 7 - iter 7930/7936 - loss 0.05190552 - samples/sec: 31.24 - lr: 0.000017
424
+ 2022-02-05 09:06:41,639 ----------------------------------------------------------------------------------------------------
425
+ 2022-02-05 09:06:41,639 EPOCH 7 done: loss 0.0519 - lr 0.0000167
426
+ 2022-02-05 09:09:20,864 DEV : loss 0.02467426098883152 - f1-score (micro avg) 0.9355
427
+ 2022-02-05 09:09:20,924 BAD EPOCHS (no improvement): 4
428
+ 2022-02-05 09:09:20,939 ----------------------------------------------------------------------------------------------------
429
+ 2022-02-05 09:16:05,134 epoch 8 - iter 793/7936 - loss 0.04726178 - samples/sec: 31.40 - lr: 0.000016
430
+ 2022-02-05 09:22:33,870 epoch 8 - iter 1586/7936 - loss 0.04719666 - samples/sec: 32.64 - lr: 0.000016
431
+ 2022-02-05 09:29:02,929 epoch 8 - iter 2379/7936 - loss 0.04663752 - samples/sec: 32.62 - lr: 0.000015
432
+ 2022-02-05 09:35:42,369 epoch 8 - iter 3172/7936 - loss 0.04634901 - samples/sec: 31.77 - lr: 0.000014
433
+ 2022-02-05 09:42:14,843 epoch 8 - iter 3965/7936 - loss 0.04602895 - samples/sec: 32.33 - lr: 0.000014
434
+ 2022-02-05 09:48:48,062 epoch 8 - iter 4758/7936 - loss 0.04582764 - samples/sec: 32.27 - lr: 0.000013
435
+ 2022-02-05 09:55:28,863 epoch 8 - iter 5551/7936 - loss 0.04566599 - samples/sec: 31.66 - lr: 0.000013
436
+ 2022-02-05 10:01:52,699 epoch 8 - iter 6344/7936 - loss 0.04545939 - samples/sec: 33.06 - lr: 0.000012
437
+ 2022-02-05 10:08:33,137 epoch 8 - iter 7137/7936 - loss 0.04526206 - samples/sec: 31.69 - lr: 0.000012
438
+ 2022-02-05 10:15:07,241 epoch 8 - iter 7930/7936 - loss 0.04503385 - samples/sec: 32.20 - lr: 0.000011
439
+ 2022-02-05 10:15:10,600 ----------------------------------------------------------------------------------------------------
440
+ 2022-02-05 10:15:10,600 EPOCH 8 done: loss 0.0450 - lr 0.0000111
441
+ 2022-02-05 10:18:00,280 DEV : loss 0.02364770695567131 - f1-score (micro avg) 0.9371
442
+ 2022-02-05 10:18:00,339 BAD EPOCHS (no improvement): 4
443
+ 2022-02-05 10:18:00,358 ----------------------------------------------------------------------------------------------------
444
+ 2022-02-05 10:24:31,011 epoch 9 - iter 793/7936 - loss 0.04122325 - samples/sec: 32.48 - lr: 0.000011
445
+ 2022-02-05 10:31:00,279 epoch 9 - iter 1586/7936 - loss 0.04130931 - samples/sec: 32.60 - lr: 0.000010
446
+ 2022-02-05 10:37:40,369 epoch 9 - iter 2379/7936 - loss 0.04131112 - samples/sec: 31.72 - lr: 0.000009
447
+ 2022-02-05 10:44:11,067 epoch 9 - iter 3172/7936 - loss 0.04141124 - samples/sec: 32.48 - lr: 0.000009
448
+ 2022-02-05 10:50:41,270 epoch 9 - iter 3965/7936 - loss 0.04120608 - samples/sec: 32.52 - lr: 0.000008
449
+ 2022-02-05 10:57:24,718 epoch 9 - iter 4758/7936 - loss 0.04108655 - samples/sec: 31.45 - lr: 0.000008
450
+ 2022-02-05 11:04:00,581 epoch 9 - iter 5551/7936 - loss 0.04093370 - samples/sec: 32.06 - lr: 0.000007
451
+ 2022-02-05 11:10:31,042 epoch 9 - iter 6344/7936 - loss 0.04078404 - samples/sec: 32.50 - lr: 0.000007
452
+ 2022-02-05 11:17:13,751 epoch 9 - iter 7137/7936 - loss 0.04061073 - samples/sec: 31.51 - lr: 0.000006
453
+ 2022-02-05 11:23:44,231 epoch 9 - iter 7930/7936 - loss 0.04050638 - samples/sec: 32.50 - lr: 0.000006
454
+ 2022-02-05 11:23:47,941 ----------------------------------------------------------------------------------------------------
455
+ 2022-02-05 11:23:47,942 EPOCH 9 done: loss 0.0405 - lr 0.0000056
456
+ 2022-02-05 11:26:37,114 DEV : loss 0.026182951405644417 - f1-score (micro avg) 0.9361
457
+ 2022-02-05 11:26:37,173 BAD EPOCHS (no improvement): 4
458
+ 2022-02-05 11:26:37,186 ----------------------------------------------------------------------------------------------------
459
+ 2022-02-05 11:33:05,778 epoch 10 - iter 793/7936 - loss 0.03876526 - samples/sec: 32.66 - lr: 0.000005
460
+ 2022-02-05 11:39:45,501 epoch 10 - iter 1586/7936 - loss 0.03871561 - samples/sec: 31.75 - lr: 0.000004
461
+ 2022-02-05 11:46:18,242 epoch 10 - iter 2379/7936 - loss 0.03842790 - samples/sec: 32.31 - lr: 0.000004
462
+ 2022-02-05 11:52:48,370 epoch 10 - iter 3172/7936 - loss 0.03820246 - samples/sec: 32.53 - lr: 0.000003
463
+ 2022-02-05 11:59:28,420 epoch 10 - iter 3965/7936 - loss 0.03807900 - samples/sec: 31.72 - lr: 0.000003
464
+ 2022-02-05 12:05:57,882 epoch 10 - iter 4758/7936 - loss 0.03798954 - samples/sec: 32.58 - lr: 0.000002
465
+ 2022-02-05 12:12:25,766 epoch 10 - iter 5551/7936 - loss 0.03803371 - samples/sec: 32.72 - lr: 0.000002
466
+ 2022-02-05 12:19:03,411 epoch 10 - iter 6344/7936 - loss 0.03805844 - samples/sec: 31.91 - lr: 0.000001
467
+ 2022-02-05 12:25:27,539 epoch 10 - iter 7137/7936 - loss 0.03799490 - samples/sec: 33.04 - lr: 0.000001
468
+ 2022-02-05 12:31:55,442 epoch 10 - iter 7930/7936 - loss 0.03798541 - samples/sec: 32.71 - lr: 0.000000
469
+ 2022-02-05 12:31:58,461 ----------------------------------------------------------------------------------------------------
470
+ 2022-02-05 12:31:58,462 EPOCH 10 done: loss 0.0380 - lr 0.0000000
471
+ 2022-02-05 12:34:45,700 DEV : loss 0.027400659397244453 - f1-score (micro avg) 0.9368
472
+ 2022-02-05 12:34:45,760 BAD EPOCHS (no improvement): 4
473
+ 2022-02-05 12:34:46,755 ----------------------------------------------------------------------------------------------------
474
+ 2022-02-05 12:34:46,757 Testing using last state of model ...
475
+ 2022-02-05 12:37:34,421 0.9329 0.9323 0.9326 0.8893
476
+ 2022-02-05 12:37:34,422
477
+ Results:
478
+ - F-score (micro) 0.9326
479
+ - F-score (macro) 0.9111
480
+ - Accuracy 0.8893
481
+
482
+ By class:
483
+ precision recall f1-score support
484
+
485
+ pers 0.9355 0.9279 0.9317 2734
486
+ loc 0.9242 0.9335 0.9288 1384
487
+ amount 0.9800 0.9800 0.9800 250
488
+ time 0.9456 0.9576 0.9516 236
489
+ func 0.9333 0.9000 0.9164 140
490
+ org 0.8148 0.8980 0.8544 49
491
+ prod 0.8621 0.9259 0.8929 27
492
+ event 0.8333 0.8333 0.8333 12
493
+
494
+ micro avg 0.9329 0.9323 0.9326 4832
495
+ macro avg 0.9036 0.9195 0.9111 4832
496
+ weighted avg 0.9331 0.9323 0.9327 4832
497
+ samples avg 0.8893 0.8893 0.8893 4832
498
+
499
+ 2022-02-05 12:37:34,422 ----------------------------------------------------------------------------------------------------
weights.txt ADDED
File without changes