pjox commited on
Commit
2642fb3
1 Parent(s): 628c867

Upload 6 files

Browse files
Files changed (6) hide show
  1. dev.tsv +0 -0
  2. final-model.pt +3 -0
  3. loss.tsv +11 -0
  4. test.tsv +0 -0
  5. training.log +538 -0
  6. weights.txt +0 -0
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ce184125031081cef1c7b103a2731875c894080ae0b604bc5b781e87a7a62d0
3
+ size 442756141
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 12:56:18 4 0.0000 3.8419383407730647 3.509683847427368 0.3053 0.3053 0.3053 0.3053
3
+ 2 12:59:20 4 0.0000 3.2227634368718494 2.775869846343994 0.6141 0.6141 0.6141 0.6141
4
+ 3 13:02:23 4 0.0000 2.7700508728423903 2.410931348800659 0.819 0.819 0.819 0.819
5
+ 4 13:05:27 4 0.0000 2.5123233380738026 2.1908302307128906 0.8605 0.8605 0.8605 0.8605
6
+ 5 13:08:30 4 0.0000 2.350920672660358 2.0516607761383057 0.8737 0.8737 0.8737 0.8737
7
+ 6 13:11:34 4 0.0000 2.2365647102395845 1.9612011909484863 0.884 0.884 0.884 0.884
8
+ 7 13:14:37 4 0.0000 2.1661910551931784 1.8981177806854248 0.9008 0.9008 0.9008 0.9008
9
+ 8 13:17:39 4 0.0000 2.1112017686144187 1.8548760414123535 0.9117 0.9117 0.9117 0.9117
10
+ 9 13:20:43 4 0.0000 2.0759186003590093 1.830302357673645 0.9161 0.9161 0.9161 0.9161
11
+ 10 13:23:46 4 0.0000 2.0624352113596314 1.8217284679412842 0.9195 0.9195 0.9195 0.9195
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,538 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2022-02-04 12:53:17,467 ----------------------------------------------------------------------------------------------------
2
+ 2022-02-04 12:53:17,468 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): CamembertModel(
5
+ (embeddings): RobertaEmbeddings(
6
+ (word_embeddings): Embedding(32005, 768, padding_idx=1)
7
+ (position_embeddings): Embedding(514, 768, padding_idx=1)
8
+ (token_type_embeddings): Embedding(1, 768)
9
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): RobertaEncoder(
13
+ (layer): ModuleList(
14
+ (0): RobertaLayer(
15
+ (attention): RobertaAttention(
16
+ (self): RobertaSelfAttention(
17
+ (query): Linear(in_features=768, out_features=768, bias=True)
18
+ (key): Linear(in_features=768, out_features=768, bias=True)
19
+ (value): Linear(in_features=768, out_features=768, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): RobertaSelfOutput(
23
+ (dense): Linear(in_features=768, out_features=768, bias=True)
24
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): RobertaIntermediate(
29
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
30
+ )
31
+ (output): RobertaOutput(
32
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
33
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
34
+ (dropout): Dropout(p=0.1, inplace=False)
35
+ )
36
+ )
37
+ (1): RobertaLayer(
38
+ (attention): RobertaAttention(
39
+ (self): RobertaSelfAttention(
40
+ (query): Linear(in_features=768, out_features=768, bias=True)
41
+ (key): Linear(in_features=768, out_features=768, bias=True)
42
+ (value): Linear(in_features=768, out_features=768, bias=True)
43
+ (dropout): Dropout(p=0.1, inplace=False)
44
+ )
45
+ (output): RobertaSelfOutput(
46
+ (dense): Linear(in_features=768, out_features=768, bias=True)
47
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
48
+ (dropout): Dropout(p=0.1, inplace=False)
49
+ )
50
+ )
51
+ (intermediate): RobertaIntermediate(
52
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
53
+ )
54
+ (output): RobertaOutput(
55
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
56
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
57
+ (dropout): Dropout(p=0.1, inplace=False)
58
+ )
59
+ )
60
+ (2): RobertaLayer(
61
+ (attention): RobertaAttention(
62
+ (self): RobertaSelfAttention(
63
+ (query): Linear(in_features=768, out_features=768, bias=True)
64
+ (key): Linear(in_features=768, out_features=768, bias=True)
65
+ (value): Linear(in_features=768, out_features=768, bias=True)
66
+ (dropout): Dropout(p=0.1, inplace=False)
67
+ )
68
+ (output): RobertaSelfOutput(
69
+ (dense): Linear(in_features=768, out_features=768, bias=True)
70
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
71
+ (dropout): Dropout(p=0.1, inplace=False)
72
+ )
73
+ )
74
+ (intermediate): RobertaIntermediate(
75
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
76
+ )
77
+ (output): RobertaOutput(
78
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
79
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
80
+ (dropout): Dropout(p=0.1, inplace=False)
81
+ )
82
+ )
83
+ (3): RobertaLayer(
84
+ (attention): RobertaAttention(
85
+ (self): RobertaSelfAttention(
86
+ (query): Linear(in_features=768, out_features=768, bias=True)
87
+ (key): Linear(in_features=768, out_features=768, bias=True)
88
+ (value): Linear(in_features=768, out_features=768, bias=True)
89
+ (dropout): Dropout(p=0.1, inplace=False)
90
+ )
91
+ (output): RobertaSelfOutput(
92
+ (dense): Linear(in_features=768, out_features=768, bias=True)
93
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
94
+ (dropout): Dropout(p=0.1, inplace=False)
95
+ )
96
+ )
97
+ (intermediate): RobertaIntermediate(
98
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
99
+ )
100
+ (output): RobertaOutput(
101
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
102
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
103
+ (dropout): Dropout(p=0.1, inplace=False)
104
+ )
105
+ )
106
+ (4): RobertaLayer(
107
+ (attention): RobertaAttention(
108
+ (self): RobertaSelfAttention(
109
+ (query): Linear(in_features=768, out_features=768, bias=True)
110
+ (key): Linear(in_features=768, out_features=768, bias=True)
111
+ (value): Linear(in_features=768, out_features=768, bias=True)
112
+ (dropout): Dropout(p=0.1, inplace=False)
113
+ )
114
+ (output): RobertaSelfOutput(
115
+ (dense): Linear(in_features=768, out_features=768, bias=True)
116
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
117
+ (dropout): Dropout(p=0.1, inplace=False)
118
+ )
119
+ )
120
+ (intermediate): RobertaIntermediate(
121
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
122
+ )
123
+ (output): RobertaOutput(
124
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
125
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
126
+ (dropout): Dropout(p=0.1, inplace=False)
127
+ )
128
+ )
129
+ (5): RobertaLayer(
130
+ (attention): RobertaAttention(
131
+ (self): RobertaSelfAttention(
132
+ (query): Linear(in_features=768, out_features=768, bias=True)
133
+ (key): Linear(in_features=768, out_features=768, bias=True)
134
+ (value): Linear(in_features=768, out_features=768, bias=True)
135
+ (dropout): Dropout(p=0.1, inplace=False)
136
+ )
137
+ (output): RobertaSelfOutput(
138
+ (dense): Linear(in_features=768, out_features=768, bias=True)
139
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
140
+ (dropout): Dropout(p=0.1, inplace=False)
141
+ )
142
+ )
143
+ (intermediate): RobertaIntermediate(
144
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
145
+ )
146
+ (output): RobertaOutput(
147
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
148
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
149
+ (dropout): Dropout(p=0.1, inplace=False)
150
+ )
151
+ )
152
+ (6): RobertaLayer(
153
+ (attention): RobertaAttention(
154
+ (self): RobertaSelfAttention(
155
+ (query): Linear(in_features=768, out_features=768, bias=True)
156
+ (key): Linear(in_features=768, out_features=768, bias=True)
157
+ (value): Linear(in_features=768, out_features=768, bias=True)
158
+ (dropout): Dropout(p=0.1, inplace=False)
159
+ )
160
+ (output): RobertaSelfOutput(
161
+ (dense): Linear(in_features=768, out_features=768, bias=True)
162
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
163
+ (dropout): Dropout(p=0.1, inplace=False)
164
+ )
165
+ )
166
+ (intermediate): RobertaIntermediate(
167
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
168
+ )
169
+ (output): RobertaOutput(
170
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
171
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
172
+ (dropout): Dropout(p=0.1, inplace=False)
173
+ )
174
+ )
175
+ (7): RobertaLayer(
176
+ (attention): RobertaAttention(
177
+ (self): RobertaSelfAttention(
178
+ (query): Linear(in_features=768, out_features=768, bias=True)
179
+ (key): Linear(in_features=768, out_features=768, bias=True)
180
+ (value): Linear(in_features=768, out_features=768, bias=True)
181
+ (dropout): Dropout(p=0.1, inplace=False)
182
+ )
183
+ (output): RobertaSelfOutput(
184
+ (dense): Linear(in_features=768, out_features=768, bias=True)
185
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
186
+ (dropout): Dropout(p=0.1, inplace=False)
187
+ )
188
+ )
189
+ (intermediate): RobertaIntermediate(
190
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
191
+ )
192
+ (output): RobertaOutput(
193
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
194
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
195
+ (dropout): Dropout(p=0.1, inplace=False)
196
+ )
197
+ )
198
+ (8): RobertaLayer(
199
+ (attention): RobertaAttention(
200
+ (self): RobertaSelfAttention(
201
+ (query): Linear(in_features=768, out_features=768, bias=True)
202
+ (key): Linear(in_features=768, out_features=768, bias=True)
203
+ (value): Linear(in_features=768, out_features=768, bias=True)
204
+ (dropout): Dropout(p=0.1, inplace=False)
205
+ )
206
+ (output): RobertaSelfOutput(
207
+ (dense): Linear(in_features=768, out_features=768, bias=True)
208
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
209
+ (dropout): Dropout(p=0.1, inplace=False)
210
+ )
211
+ )
212
+ (intermediate): RobertaIntermediate(
213
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
214
+ )
215
+ (output): RobertaOutput(
216
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
217
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
218
+ (dropout): Dropout(p=0.1, inplace=False)
219
+ )
220
+ )
221
+ (9): RobertaLayer(
222
+ (attention): RobertaAttention(
223
+ (self): RobertaSelfAttention(
224
+ (query): Linear(in_features=768, out_features=768, bias=True)
225
+ (key): Linear(in_features=768, out_features=768, bias=True)
226
+ (value): Linear(in_features=768, out_features=768, bias=True)
227
+ (dropout): Dropout(p=0.1, inplace=False)
228
+ )
229
+ (output): RobertaSelfOutput(
230
+ (dense): Linear(in_features=768, out_features=768, bias=True)
231
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
232
+ (dropout): Dropout(p=0.1, inplace=False)
233
+ )
234
+ )
235
+ (intermediate): RobertaIntermediate(
236
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
237
+ )
238
+ (output): RobertaOutput(
239
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
240
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
241
+ (dropout): Dropout(p=0.1, inplace=False)
242
+ )
243
+ )
244
+ (10): RobertaLayer(
245
+ (attention): RobertaAttention(
246
+ (self): RobertaSelfAttention(
247
+ (query): Linear(in_features=768, out_features=768, bias=True)
248
+ (key): Linear(in_features=768, out_features=768, bias=True)
249
+ (value): Linear(in_features=768, out_features=768, bias=True)
250
+ (dropout): Dropout(p=0.1, inplace=False)
251
+ )
252
+ (output): RobertaSelfOutput(
253
+ (dense): Linear(in_features=768, out_features=768, bias=True)
254
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
255
+ (dropout): Dropout(p=0.1, inplace=False)
256
+ )
257
+ )
258
+ (intermediate): RobertaIntermediate(
259
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
260
+ )
261
+ (output): RobertaOutput(
262
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
263
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
264
+ (dropout): Dropout(p=0.1, inplace=False)
265
+ )
266
+ )
267
+ (11): RobertaLayer(
268
+ (attention): RobertaAttention(
269
+ (self): RobertaSelfAttention(
270
+ (query): Linear(in_features=768, out_features=768, bias=True)
271
+ (key): Linear(in_features=768, out_features=768, bias=True)
272
+ (value): Linear(in_features=768, out_features=768, bias=True)
273
+ (dropout): Dropout(p=0.1, inplace=False)
274
+ )
275
+ (output): RobertaSelfOutput(
276
+ (dense): Linear(in_features=768, out_features=768, bias=True)
277
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
278
+ (dropout): Dropout(p=0.1, inplace=False)
279
+ )
280
+ )
281
+ (intermediate): RobertaIntermediate(
282
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
283
+ )
284
+ (output): RobertaOutput(
285
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
286
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
287
+ (dropout): Dropout(p=0.1, inplace=False)
288
+ )
289
+ )
290
+ )
291
+ )
292
+ (pooler): RobertaPooler(
293
+ (dense): Linear(in_features=768, out_features=768, bias=True)
294
+ (activation): Tanh()
295
+ )
296
+ )
297
+ )
298
+ (word_dropout): WordDropout(p=0.05)
299
+ (locked_dropout): LockedDropout(p=0.5)
300
+ (linear): Linear(in_features=768, out_features=51, bias=True)
301
+ (beta): 1.0
302
+ (weights): None
303
+ (weight_tensor) None
304
+ )"
305
+ 2022-02-04 12:53:17,506 ----------------------------------------------------------------------------------------------------
306
+ 2022-02-04 12:53:17,506 Corpus: "Corpus: 5642 train + 195 dev + 649 test sentences"
307
+ 2022-02-04 12:53:17,506 ----------------------------------------------------------------------------------------------------
308
+ 2022-02-04 12:53:17,506 Parameters:
309
+ 2022-02-04 12:53:17,506 - learning_rate: "5e-06"
310
+ 2022-02-04 12:53:17,506 - mini_batch_size: "32"
311
+ 2022-02-04 12:53:17,506 - patience: "3"
312
+ 2022-02-04 12:53:17,506 - anneal_factor: "0.5"
313
+ 2022-02-04 12:53:17,506 - max_epochs: "10"
314
+ 2022-02-04 12:53:17,506 - shuffle: "True"
315
+ 2022-02-04 12:53:17,506 - train_with_dev: "False"
316
+ 2022-02-04 12:53:17,506 - batch_growth_annealing: "False"
317
+ 2022-02-04 12:53:17,506 ----------------------------------------------------------------------------------------------------
318
+ 2022-02-04 12:53:17,506 Model training base path: "resources/taggers/pos-camembert"
319
+ 2022-02-04 12:53:17,506 ----------------------------------------------------------------------------------------------------
320
+ 2022-02-04 12:53:17,511 Device: cuda:0
321
+ 2022-02-04 12:53:17,511 ----------------------------------------------------------------------------------------------------
322
+ 2022-02-04 12:53:17,511 Embeddings storage mode: none
323
+ 2022-02-04 12:53:17,513 ----------------------------------------------------------------------------------------------------
324
+ 2022-02-04 12:53:38,315 epoch 1 - iter 17/177 - loss 3.96872255 - samples/sec: 26.15 - lr: 0.000000
325
+ 2022-02-04 12:53:54,561 epoch 1 - iter 34/177 - loss 3.96629180 - samples/sec: 33.49 - lr: 0.000001
326
+ 2022-02-04 12:54:11,140 epoch 1 - iter 51/177 - loss 3.95985736 - samples/sec: 32.82 - lr: 0.000001
327
+ 2022-02-04 12:54:27,471 epoch 1 - iter 68/177 - loss 3.95248851 - samples/sec: 33.31 - lr: 0.000002
328
+ 2022-02-04 12:54:44,574 epoch 1 - iter 85/177 - loss 3.94223845 - samples/sec: 31.81 - lr: 0.000002
329
+ 2022-02-04 12:54:59,811 epoch 1 - iter 102/177 - loss 3.93034373 - samples/sec: 35.71 - lr: 0.000003
330
+ 2022-02-04 12:55:17,140 epoch 1 - iter 119/177 - loss 3.91667895 - samples/sec: 31.39 - lr: 0.000003
331
+ 2022-02-04 12:55:33,245 epoch 1 - iter 136/177 - loss 3.90088222 - samples/sec: 33.78 - lr: 0.000004
332
+ 2022-02-04 12:55:48,743 epoch 1 - iter 153/177 - loss 3.87766994 - samples/sec: 35.11 - lr: 0.000004
333
+ 2022-02-04 12:56:06,269 epoch 1 - iter 170/177 - loss 3.84880099 - samples/sec: 31.04 - lr: 0.000005
334
+ 2022-02-04 12:56:12,033 ----------------------------------------------------------------------------------------------------
335
+ 2022-02-04 12:56:12,033 EPOCH 1 done: loss 3.8419 - lr 0.0000050
336
+ 2022-02-04 12:56:18,260 DEV : loss 3.509683847427368 - f1-score (micro avg) 0.3053
337
+ 2022-02-04 12:56:18,262 BAD EPOCHS (no improvement): 4
338
+ 2022-02-04 12:56:18,285 ----------------------------------------------------------------------------------------------------
339
+ 2022-02-04 12:56:35,575 epoch 2 - iter 17/177 - loss 3.54034313 - samples/sec: 31.47 - lr: 0.000005
340
+ 2022-02-04 12:56:52,475 epoch 2 - iter 34/177 - loss 3.50300407 - samples/sec: 32.19 - lr: 0.000005
341
+ 2022-02-04 12:57:09,058 epoch 2 - iter 51/177 - loss 3.46864739 - samples/sec: 32.81 - lr: 0.000005
342
+ 2022-02-04 12:57:25,624 epoch 2 - iter 68/177 - loss 3.43125430 - samples/sec: 32.84 - lr: 0.000005
343
+ 2022-02-04 12:57:42,941 epoch 2 - iter 85/177 - loss 3.39270879 - samples/sec: 31.42 - lr: 0.000005
344
+ 2022-02-04 12:57:59,153 epoch 2 - iter 102/177 - loss 3.35791389 - samples/sec: 33.56 - lr: 0.000005
345
+ 2022-02-04 12:58:16,864 epoch 2 - iter 119/177 - loss 3.32573531 - samples/sec: 30.72 - lr: 0.000005
346
+ 2022-02-04 12:58:34,354 epoch 2 - iter 136/177 - loss 3.29370429 - samples/sec: 31.11 - lr: 0.000005
347
+ 2022-02-04 12:58:51,116 epoch 2 - iter 153/177 - loss 3.26367901 - samples/sec: 32.46 - lr: 0.000005
348
+ 2022-02-04 12:59:08,117 epoch 2 - iter 170/177 - loss 3.23382669 - samples/sec: 32.00 - lr: 0.000004
349
+ 2022-02-04 12:59:15,072 ----------------------------------------------------------------------------------------------------
350
+ 2022-02-04 12:59:15,074 EPOCH 2 done: loss 3.2228 - lr 0.0000044
351
+ 2022-02-04 12:59:20,452 DEV : loss 2.775869846343994 - f1-score (micro avg) 0.6141
352
+ 2022-02-04 12:59:20,455 BAD EPOCHS (no improvement): 4
353
+ 2022-02-04 12:59:20,455 ----------------------------------------------------------------------------------------------------
354
+ 2022-02-04 12:59:38,069 epoch 3 - iter 17/177 - loss 2.92343717 - samples/sec: 30.89 - lr: 0.000004
355
+ 2022-02-04 12:59:54,400 epoch 3 - iter 34/177 - loss 2.90201388 - samples/sec: 33.32 - lr: 0.000004
356
+ 2022-02-04 13:00:12,150 epoch 3 - iter 51/177 - loss 2.88495451 - samples/sec: 30.65 - lr: 0.000004
357
+ 2022-02-04 13:00:28,960 epoch 3 - iter 68/177 - loss 2.86475060 - samples/sec: 32.37 - lr: 0.000004
358
+ 2022-02-04 13:00:47,016 epoch 3 - iter 85/177 - loss 2.84779479 - samples/sec: 30.13 - lr: 0.000004
359
+ 2022-02-04 13:01:03,811 epoch 3 - iter 102/177 - loss 2.83018073 - samples/sec: 32.40 - lr: 0.000004
360
+ 2022-02-04 13:01:19,598 epoch 3 - iter 119/177 - loss 2.81577196 - samples/sec: 34.47 - lr: 0.000004
361
+ 2022-02-04 13:01:36,746 epoch 3 - iter 136/177 - loss 2.80310518 - samples/sec: 31.73 - lr: 0.000004
362
+ 2022-02-04 13:01:53,532 epoch 3 - iter 153/177 - loss 2.79075673 - samples/sec: 32.41 - lr: 0.000004
363
+ 2022-02-04 13:02:11,809 epoch 3 - iter 170/177 - loss 2.77624103 - samples/sec: 29.77 - lr: 0.000004
364
+ 2022-02-04 13:02:17,990 ----------------------------------------------------------------------------------------------------
365
+ 2022-02-04 13:02:17,991 EPOCH 3 done: loss 2.7701 - lr 0.0000039
366
+ 2022-02-04 13:02:23,777 DEV : loss 2.410931348800659 - f1-score (micro avg) 0.819
367
+ 2022-02-04 13:02:23,780 BAD EPOCHS (no improvement): 4
368
+ 2022-02-04 13:02:23,781 ----------------------------------------------------------------------------------------------------
369
+ 2022-02-04 13:02:41,231 epoch 4 - iter 17/177 - loss 2.60188784 - samples/sec: 31.18 - lr: 0.000004
370
+ 2022-02-04 13:02:58,635 epoch 4 - iter 34/177 - loss 2.59095213 - samples/sec: 31.26 - lr: 0.000004
371
+ 2022-02-04 13:03:15,040 epoch 4 - iter 51/177 - loss 2.58502577 - samples/sec: 33.17 - lr: 0.000004
372
+ 2022-02-04 13:03:32,700 epoch 4 - iter 68/177 - loss 2.57149732 - samples/sec: 30.81 - lr: 0.000004
373
+ 2022-02-04 13:03:49,889 epoch 4 - iter 85/177 - loss 2.55924475 - samples/sec: 31.65 - lr: 0.000004
374
+ 2022-02-04 13:04:07,257 epoch 4 - iter 102/177 - loss 2.54972860 - samples/sec: 31.33 - lr: 0.000004
375
+ 2022-02-04 13:04:24,141 epoch 4 - iter 119/177 - loss 2.54070048 - samples/sec: 32.23 - lr: 0.000004
376
+ 2022-02-04 13:04:40,320 epoch 4 - iter 136/177 - loss 2.53210863 - samples/sec: 33.69 - lr: 0.000003
377
+ 2022-02-04 13:04:57,281 epoch 4 - iter 153/177 - loss 2.52441237 - samples/sec: 32.08 - lr: 0.000003
378
+ 2022-02-04 13:05:15,246 epoch 4 - iter 170/177 - loss 2.51520228 - samples/sec: 30.29 - lr: 0.000003
379
+ 2022-02-04 13:05:21,452 ----------------------------------------------------------------------------------------------------
380
+ 2022-02-04 13:05:21,458 EPOCH 4 done: loss 2.5123 - lr 0.0000033
381
+ 2022-02-04 13:05:27,295 DEV : loss 2.1908302307128906 - f1-score (micro avg) 0.8605
382
+ 2022-02-04 13:05:27,310 BAD EPOCHS (no improvement): 4
383
+ 2022-02-04 13:05:27,310 ----------------------------------------------------------------------------------------------------
384
+ 2022-02-04 13:05:44,024 epoch 5 - iter 17/177 - loss 2.39887737 - samples/sec: 32.55 - lr: 0.000003
385
+ 2022-02-04 13:06:01,687 epoch 5 - iter 34/177 - loss 2.39948538 - samples/sec: 30.80 - lr: 0.000003
386
+ 2022-02-04 13:06:19,664 epoch 5 - iter 51/177 - loss 2.40078878 - samples/sec: 30.29 - lr: 0.000003
387
+ 2022-02-04 13:06:36,241 epoch 5 - iter 68/177 - loss 2.39524823 - samples/sec: 32.93 - lr: 0.000003
388
+ 2022-02-04 13:06:52,683 epoch 5 - iter 85/177 - loss 2.38764769 - samples/sec: 33.17 - lr: 0.000003
389
+ 2022-02-04 13:07:09,718 epoch 5 - iter 102/177 - loss 2.38104055 - samples/sec: 31.94 - lr: 0.000003
390
+ 2022-02-04 13:07:26,578 epoch 5 - iter 119/177 - loss 2.37384530 - samples/sec: 32.29 - lr: 0.000003
391
+ 2022-02-04 13:07:42,599 epoch 5 - iter 136/177 - loss 2.36823710 - samples/sec: 33.96 - lr: 0.000003
392
+ 2022-02-04 13:08:00,031 epoch 5 - iter 153/177 - loss 2.36030726 - samples/sec: 31.25 - lr: 0.000003
393
+ 2022-02-04 13:08:17,779 epoch 5 - iter 170/177 - loss 2.35368343 - samples/sec: 30.72 - lr: 0.000003
394
+ 2022-02-04 13:08:24,110 ----------------------------------------------------------------------------------------------------
395
+ 2022-02-04 13:08:24,111 EPOCH 5 done: loss 2.3509 - lr 0.0000028
396
+ 2022-02-04 13:08:30,298 DEV : loss 2.0516607761383057 - f1-score (micro avg) 0.8737
397
+ 2022-02-04 13:08:30,301 BAD EPOCHS (no improvement): 4
398
+ 2022-02-04 13:08:30,301 ----------------------------------------------------------------------------------------------------
399
+ 2022-02-04 13:08:46,667 epoch 6 - iter 17/177 - loss 2.27743160 - samples/sec: 33.25 - lr: 0.000003
400
+ 2022-02-04 13:09:04,814 epoch 6 - iter 34/177 - loss 2.27286852 - samples/sec: 29.99 - lr: 0.000003
401
+ 2022-02-04 13:09:21,239 epoch 6 - iter 51/177 - loss 2.27175336 - samples/sec: 33.23 - lr: 0.000003
402
+ 2022-02-04 13:09:38,163 epoch 6 - iter 68/177 - loss 2.26491131 - samples/sec: 32.15 - lr: 0.000003
403
+ 2022-02-04 13:09:54,338 epoch 6 - iter 85/177 - loss 2.25999023 - samples/sec: 33.65 - lr: 0.000003
404
+ 2022-02-04 13:10:12,270 epoch 6 - iter 102/177 - loss 2.25580949 - samples/sec: 30.38 - lr: 0.000002
405
+ 2022-02-04 13:10:29,245 epoch 6 - iter 119/177 - loss 2.25275307 - samples/sec: 32.13 - lr: 0.000002
406
+ 2022-02-04 13:10:46,065 epoch 6 - iter 136/177 - loss 2.24661845 - samples/sec: 32.40 - lr: 0.000002
407
+ 2022-02-04 13:11:03,357 epoch 6 - iter 153/177 - loss 2.24241040 - samples/sec: 31.47 - lr: 0.000002
408
+ 2022-02-04 13:11:22,211 epoch 6 - iter 170/177 - loss 2.23773462 - samples/sec: 28.87 - lr: 0.000002
409
+ 2022-02-04 13:11:28,309 ----------------------------------------------------------------------------------------------------
410
+ 2022-02-04 13:11:28,321 EPOCH 6 done: loss 2.2366 - lr 0.0000022
411
+ 2022-02-04 13:11:34,136 DEV : loss 1.9612011909484863 - f1-score (micro avg) 0.884
412
+ 2022-02-04 13:11:34,150 BAD EPOCHS (no improvement): 4
413
+ 2022-02-04 13:11:34,151 ----------------------------------------------------------------------------------------------------
414
+ 2022-02-04 13:11:50,446 epoch 7 - iter 17/177 - loss 2.19566504 - samples/sec: 33.39 - lr: 0.000002
415
+ 2022-02-04 13:12:06,851 epoch 7 - iter 34/177 - loss 2.19802945 - samples/sec: 33.21 - lr: 0.000002
416
+ 2022-02-04 13:12:23,401 epoch 7 - iter 51/177 - loss 2.19405535 - samples/sec: 32.88 - lr: 0.000002
417
+ 2022-02-04 13:12:41,303 epoch 7 - iter 68/177 - loss 2.19162087 - samples/sec: 30.39 - lr: 0.000002
418
+ 2022-02-04 13:12:58,144 epoch 7 - iter 85/177 - loss 2.18471516 - samples/sec: 32.35 - lr: 0.000002
419
+ 2022-02-04 13:13:16,467 epoch 7 - iter 102/177 - loss 2.18080579 - samples/sec: 29.75 - lr: 0.000002
420
+ 2022-02-04 13:13:34,031 epoch 7 - iter 119/177 - loss 2.17936921 - samples/sec: 31.00 - lr: 0.000002
421
+ 2022-02-04 13:13:51,077 epoch 7 - iter 136/177 - loss 2.17514038 - samples/sec: 32.02 - lr: 0.000002
422
+ 2022-02-04 13:14:07,857 epoch 7 - iter 153/177 - loss 2.17141812 - samples/sec: 32.48 - lr: 0.000002
423
+ 2022-02-04 13:14:25,422 epoch 7 - iter 170/177 - loss 2.16711471 - samples/sec: 30.99 - lr: 0.000002
424
+ 2022-02-04 13:14:31,227 ----------------------------------------------------------------------------------------------------
425
+ 2022-02-04 13:14:31,228 EPOCH 7 done: loss 2.1662 - lr 0.0000017
426
+ 2022-02-04 13:14:37,035 DEV : loss 1.8981177806854248 - f1-score (micro avg) 0.9008
427
+ 2022-02-04 13:14:37,049 BAD EPOCHS (no improvement): 4
428
+ 2022-02-04 13:14:37,050 ----------------------------------------------------------------------------------------------------
429
+ 2022-02-04 13:14:54,867 epoch 8 - iter 17/177 - loss 2.13839948 - samples/sec: 30.54 - lr: 0.000002
430
+ 2022-02-04 13:15:11,283 epoch 8 - iter 34/177 - loss 2.13301605 - samples/sec: 33.16 - lr: 0.000002
431
+ 2022-02-04 13:15:28,761 epoch 8 - iter 51/177 - loss 2.12335776 - samples/sec: 31.15 - lr: 0.000002
432
+ 2022-02-04 13:15:44,480 epoch 8 - iter 68/177 - loss 2.12525500 - samples/sec: 34.61 - lr: 0.000001
433
+ 2022-02-04 13:16:01,084 epoch 8 - iter 85/177 - loss 2.12100353 - samples/sec: 32.77 - lr: 0.000001
434
+ 2022-02-04 13:16:17,945 epoch 8 - iter 102/177 - loss 2.12081652 - samples/sec: 32.27 - lr: 0.000001
435
+ 2022-02-04 13:16:34,469 epoch 8 - iter 119/177 - loss 2.11872473 - samples/sec: 32.93 - lr: 0.000001
436
+ 2022-02-04 13:16:50,308 epoch 8 - iter 136/177 - loss 2.11635062 - samples/sec: 34.35 - lr: 0.000001
437
+ 2022-02-04 13:17:07,313 epoch 8 - iter 153/177 - loss 2.11371370 - samples/sec: 32.00 - lr: 0.000001
438
+ 2022-02-04 13:17:25,553 epoch 8 - iter 170/177 - loss 2.11100152 - samples/sec: 29.83 - lr: 0.000001
439
+ 2022-02-04 13:17:33,472 ----------------------------------------------------------------------------------------------------
440
+ 2022-02-04 13:17:33,473 EPOCH 8 done: loss 2.1112 - lr 0.0000011
441
+ 2022-02-04 13:17:39,308 DEV : loss 1.8548760414123535 - f1-score (micro avg) 0.9117
442
+ 2022-02-04 13:17:39,311 BAD EPOCHS (no improvement): 4
443
+ 2022-02-04 13:17:39,311 ----------------------------------------------------------------------------------------------------
444
+ 2022-02-04 13:17:56,622 epoch 9 - iter 17/177 - loss 2.06819398 - samples/sec: 31.43 - lr: 0.000001
445
+ 2022-02-04 13:18:13,360 epoch 9 - iter 34/177 - loss 2.07590305 - samples/sec: 32.51 - lr: 0.000001
446
+ 2022-02-04 13:18:31,366 epoch 9 - iter 51/177 - loss 2.07666788 - samples/sec: 30.22 - lr: 0.000001
447
+ 2022-02-04 13:18:49,983 epoch 9 - iter 68/177 - loss 2.07961625 - samples/sec: 29.23 - lr: 0.000001
448
+ 2022-02-04 13:19:06,239 epoch 9 - iter 85/177 - loss 2.08063462 - samples/sec: 33.47 - lr: 0.000001
449
+ 2022-02-04 13:19:23,068 epoch 9 - iter 102/177 - loss 2.08002246 - samples/sec: 32.33 - lr: 0.000001
450
+ 2022-02-04 13:19:40,188 epoch 9 - iter 119/177 - loss 2.07956869 - samples/sec: 31.78 - lr: 0.000001
451
+ 2022-02-04 13:19:57,482 epoch 9 - iter 136/177 - loss 2.07835867 - samples/sec: 31.47 - lr: 0.000001
452
+ 2022-02-04 13:20:14,155 epoch 9 - iter 153/177 - loss 2.07750905 - samples/sec: 32.64 - lr: 0.000001
453
+ 2022-02-04 13:20:31,533 epoch 9 - iter 170/177 - loss 2.07545212 - samples/sec: 31.31 - lr: 0.000001
454
+ 2022-02-04 13:20:37,466 ----------------------------------------------------------------------------------------------------
455
+ 2022-02-04 13:20:37,468 EPOCH 9 done: loss 2.0759 - lr 0.0000006
456
+ 2022-02-04 13:20:43,299 DEV : loss 1.830302357673645 - f1-score (micro avg) 0.9161
457
+ 2022-02-04 13:20:43,314 BAD EPOCHS (no improvement): 4
458
+ 2022-02-04 13:20:43,314 ----------------------------------------------------------------------------------------------------
459
+ 2022-02-04 13:21:00,247 epoch 10 - iter 17/177 - loss 2.06625894 - samples/sec: 32.13 - lr: 0.000001
460
+ 2022-02-04 13:21:16,847 epoch 10 - iter 34/177 - loss 2.06850742 - samples/sec: 32.78 - lr: 0.000000
461
+ 2022-02-04 13:21:34,047 epoch 10 - iter 51/177 - loss 2.06653386 - samples/sec: 31.68 - lr: 0.000000
462
+ 2022-02-04 13:21:50,597 epoch 10 - iter 68/177 - loss 2.06650174 - samples/sec: 32.88 - lr: 0.000000
463
+ 2022-02-04 13:22:07,286 epoch 10 - iter 85/177 - loss 2.06409229 - samples/sec: 32.61 - lr: 0.000000
464
+ 2022-02-04 13:22:25,744 epoch 10 - iter 102/177 - loss 2.06162033 - samples/sec: 29.48 - lr: 0.000000
465
+ 2022-02-04 13:22:43,419 epoch 10 - iter 119/177 - loss 2.06248176 - samples/sec: 30.78 - lr: 0.000000
466
+ 2022-02-04 13:22:59,502 epoch 10 - iter 136/177 - loss 2.06392395 - samples/sec: 33.83 - lr: 0.000000
467
+ 2022-02-04 13:23:16,396 epoch 10 - iter 153/177 - loss 2.06446242 - samples/sec: 32.21 - lr: 0.000000
468
+ 2022-02-04 13:23:33,136 epoch 10 - iter 170/177 - loss 2.06210437 - samples/sec: 32.50 - lr: 0.000000
469
+ 2022-02-04 13:23:40,551 ----------------------------------------------------------------------------------------------------
470
+ 2022-02-04 13:23:40,552 EPOCH 10 done: loss 2.0624 - lr 0.0000000
471
+ 2022-02-04 13:23:46,365 DEV : loss 1.8217284679412842 - f1-score (micro avg) 0.9195
472
+ 2022-02-04 13:23:46,367 BAD EPOCHS (no improvement): 4
473
+ 2022-02-04 13:23:47,542 ----------------------------------------------------------------------------------------------------
474
+ 2022-02-04 13:23:47,544 Testing using last state of model ...
475
+ 2022-02-04 13:24:07,461 0.9181 0.9181 0.9181 0.9181
476
+ 2022-02-04 13:24:07,462
477
+ Results:
478
+ - F-score (micro) 0.9181
479
+ - F-score (macro) 0.439
480
+ - Accuracy 0.9181
481
+
482
+ By class:
483
+ precision recall f1-score support
484
+
485
+ NOMcom 0.9530 0.9808 0.9667 2130
486
+ VERcjg 0.9683 0.9935 0.9807 1535
487
+ PRE 0.8411 0.9940 0.9112 1331
488
+ PROper 0.9253 0.9963 0.9595 1368
489
+ PONfbl 0.9824 0.9993 0.9908 1341
490
+ ADVgen 0.8179 0.8276 0.8227 841
491
+ PONfrt 0.9721 1.0000 0.9859 662
492
+ DETdef 0.9393 0.9967 0.9672 606
493
+ ADJqua 0.8289 0.9400 0.8810 500
494
+ VERinf 0.9706 0.9960 0.9831 497
495
+ DETpos 0.9791 0.9979 0.9884 469
496
+ CONcoo 0.9645 0.9935 0.9788 465
497
+ CONsub 0.7437 0.9846 0.8473 389
498
+ VERppe 0.9042 0.9408 0.9221 321
499
+ DETndf 0.7270 0.9959 0.8405 246
500
+ NOMpro 0.9485 0.8340 0.8876 265
501
+ PROrel 0.9398 0.7519 0.8354 270
502
+ ADVneg 0.9577 0.7528 0.8430 271
503
+ DETdem 0.9934 0.9742 0.9837 155
504
+ PROind 1.0000 0.4894 0.6571 188
505
+ PROadv 0.9000 0.8108 0.8531 111
506
+ PROdem 1.0000 0.6387 0.7795 119
507
+ DETind 0.8000 0.7347 0.7660 98
508
+ PRE.DETdef 0.0000 0.0000 0.0000 183
509
+ VERppa 0.0000 0.0000 0.0000 67
510
+ PROimp 0.0000 0.0000 0.0000 54
511
+ INJ 0.0000 0.0000 0.0000 35
512
+ DETcar 0.0000 0.0000 0.0000 31
513
+ ADJind 0.0000 0.0000 0.0000 30
514
+ PROint 0.0000 0.0000 0.0000 22
515
+ ADJcar 0.0000 0.0000 0.0000 21
516
+ PROcar 0.0000 0.0000 0.0000 18
517
+ DETrel 0.0000 0.0000 0.0000 16
518
+ ADJord 0.0000 0.0000 0.0000 16
519
+ PONpga 0.0000 0.0000 0.0000 16
520
+ PROpos 0.0000 0.0000 0.0000 14
521
+ PONpdr 0.0000 0.0000 0.0000 13
522
+ DETint 0.0000 0.0000 0.0000 10
523
+ PONpxx 0.0000 0.0000 0.0000 6
524
+ ADVint 0.0000 0.0000 0.0000 5
525
+ PRE.PROrel 0.0000 0.0000 0.0000 2
526
+ latin 0.0000 0.0000 0.0000 2
527
+ PROord 0.0000 0.0000 0.0000 1
528
+ PRE.PROdem 0.0000 0.0000 0.0000 1
529
+ PRE.NOMcom 0.0000 0.0000 0.0000 1
530
+ ETR 0.0000 0.0000 0.0000 1
531
+ ADVsub 0.0000 0.0000 0.0000 1
532
+
533
+ micro avg 0.9181 0.9181 0.9181 14744
534
+ macro avg 0.4480 0.4388 0.4390 14744
535
+ weighted avg 0.8876 0.9181 0.8991 14744
536
+ samples avg 0.9181 0.9181 0.9181 14744
537
+
538
+ 2022-02-04 13:24:07,477 ----------------------------------------------------------------------------------------------------
weights.txt ADDED
File without changes