File size: 24,165 Bytes
c15ad03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
2023-10-27 17:20:54,808 ----------------------------------------------------------------------------------------------------
2023-10-27 17:20:54,809 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): XLMRobertaModel(
      (embeddings): XLMRobertaEmbeddings(
        (word_embeddings): Embedding(250003, 1024)
        (position_embeddings): Embedding(514, 1024, padding_idx=1)
        (token_type_embeddings): Embedding(1, 1024)
        (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): XLMRobertaEncoder(
        (layer): ModuleList(
          (0-23): 24 x XLMRobertaLayer(
            (attention): XLMRobertaAttention(
              (self): XLMRobertaSelfAttention(
                (query): Linear(in_features=1024, out_features=1024, bias=True)
                (key): Linear(in_features=1024, out_features=1024, bias=True)
                (value): Linear(in_features=1024, out_features=1024, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): XLMRobertaSelfOutput(
                (dense): Linear(in_features=1024, out_features=1024, bias=True)
                (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): XLMRobertaIntermediate(
              (dense): Linear(in_features=1024, out_features=4096, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): XLMRobertaOutput(
              (dense): Linear(in_features=4096, out_features=1024, bias=True)
              (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): XLMRobertaPooler(
        (dense): Linear(in_features=1024, out_features=1024, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=1024, out_features=17, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-27 17:20:54,809 ----------------------------------------------------------------------------------------------------
2023-10-27 17:20:54,809 Corpus: 14903 train + 3449 dev + 3658 test sentences
2023-10-27 17:20:54,809 ----------------------------------------------------------------------------------------------------
2023-10-27 17:20:54,809 Train:  14903 sentences
2023-10-27 17:20:54,809         (train_with_dev=False, train_with_test=False)
2023-10-27 17:20:54,809 ----------------------------------------------------------------------------------------------------
2023-10-27 17:20:54,809 Training Params:
2023-10-27 17:20:54,809  - learning_rate: "5e-06" 
2023-10-27 17:20:54,809  - mini_batch_size: "4"
2023-10-27 17:20:54,809  - max_epochs: "10"
2023-10-27 17:20:54,809  - shuffle: "True"
2023-10-27 17:20:54,810 ----------------------------------------------------------------------------------------------------
2023-10-27 17:20:54,810 Plugins:
2023-10-27 17:20:54,810  - TensorboardLogger
2023-10-27 17:20:54,810  - LinearScheduler | warmup_fraction: '0.1'
2023-10-27 17:20:54,810 ----------------------------------------------------------------------------------------------------
2023-10-27 17:20:54,810 Final evaluation on model from best epoch (best-model.pt)
2023-10-27 17:20:54,810  - metric: "('micro avg', 'f1-score')"
2023-10-27 17:20:54,810 ----------------------------------------------------------------------------------------------------
2023-10-27 17:20:54,810 Computation:
2023-10-27 17:20:54,810  - compute on device: cuda:0
2023-10-27 17:20:54,810  - embedding storage: none
2023-10-27 17:20:54,810 ----------------------------------------------------------------------------------------------------
2023-10-27 17:20:54,810 Model training base path: "flair-clean-conll-lr5e-06-bs4-3"
2023-10-27 17:20:54,810 ----------------------------------------------------------------------------------------------------
2023-10-27 17:20:54,810 ----------------------------------------------------------------------------------------------------
2023-10-27 17:20:54,810 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-27 17:21:40,162 epoch 1 - iter 372/3726 - loss 2.98651987 - time (sec): 45.35 - samples/sec: 437.70 - lr: 0.000000 - momentum: 0.000000
2023-10-27 17:22:25,895 epoch 1 - iter 744/3726 - loss 1.95456152 - time (sec): 91.08 - samples/sec: 446.47 - lr: 0.000001 - momentum: 0.000000
2023-10-27 17:23:14,767 epoch 1 - iter 1116/3726 - loss 1.47436963 - time (sec): 139.96 - samples/sec: 438.22 - lr: 0.000001 - momentum: 0.000000
2023-10-27 17:24:00,286 epoch 1 - iter 1488/3726 - loss 1.21396490 - time (sec): 185.47 - samples/sec: 440.46 - lr: 0.000002 - momentum: 0.000000
2023-10-27 17:24:46,086 epoch 1 - iter 1860/3726 - loss 1.02707175 - time (sec): 231.27 - samples/sec: 442.97 - lr: 0.000002 - momentum: 0.000000
2023-10-27 17:25:31,827 epoch 1 - iter 2232/3726 - loss 0.89439809 - time (sec): 277.02 - samples/sec: 442.39 - lr: 0.000003 - momentum: 0.000000
2023-10-27 17:26:17,786 epoch 1 - iter 2604/3726 - loss 0.78674618 - time (sec): 322.97 - samples/sec: 443.93 - lr: 0.000003 - momentum: 0.000000
2023-10-27 17:27:04,447 epoch 1 - iter 2976/3726 - loss 0.70312379 - time (sec): 369.64 - samples/sec: 443.20 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:27:50,340 epoch 1 - iter 3348/3726 - loss 0.63740018 - time (sec): 415.53 - samples/sec: 442.57 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:28:36,141 epoch 1 - iter 3720/3726 - loss 0.58486957 - time (sec): 461.33 - samples/sec: 442.64 - lr: 0.000005 - momentum: 0.000000
2023-10-27 17:28:36,887 ----------------------------------------------------------------------------------------------------
2023-10-27 17:28:36,887 EPOCH 1 done: loss 0.5838 - lr: 0.000005
2023-10-27 17:28:59,836 DEV : loss 0.08319637179374695 - f1-score (micro avg)  0.9362
2023-10-27 17:28:59,887 saving best model
2023-10-27 17:29:02,186 ----------------------------------------------------------------------------------------------------
2023-10-27 17:29:48,303 epoch 2 - iter 372/3726 - loss 0.11238039 - time (sec): 46.11 - samples/sec: 434.33 - lr: 0.000005 - momentum: 0.000000
2023-10-27 17:30:34,246 epoch 2 - iter 744/3726 - loss 0.10205120 - time (sec): 92.06 - samples/sec: 440.16 - lr: 0.000005 - momentum: 0.000000
2023-10-27 17:31:20,589 epoch 2 - iter 1116/3726 - loss 0.09058251 - time (sec): 138.40 - samples/sec: 447.87 - lr: 0.000005 - momentum: 0.000000
2023-10-27 17:32:06,506 epoch 2 - iter 1488/3726 - loss 0.09156566 - time (sec): 184.32 - samples/sec: 443.90 - lr: 0.000005 - momentum: 0.000000
2023-10-27 17:32:52,277 epoch 2 - iter 1860/3726 - loss 0.08901480 - time (sec): 230.09 - samples/sec: 444.10 - lr: 0.000005 - momentum: 0.000000
2023-10-27 17:33:38,125 epoch 2 - iter 2232/3726 - loss 0.08767046 - time (sec): 275.94 - samples/sec: 440.49 - lr: 0.000005 - momentum: 0.000000
2023-10-27 17:34:23,826 epoch 2 - iter 2604/3726 - loss 0.08532145 - time (sec): 321.64 - samples/sec: 441.24 - lr: 0.000005 - momentum: 0.000000
2023-10-27 17:35:09,960 epoch 2 - iter 2976/3726 - loss 0.08526503 - time (sec): 367.77 - samples/sec: 442.87 - lr: 0.000005 - momentum: 0.000000
2023-10-27 17:35:56,636 epoch 2 - iter 3348/3726 - loss 0.08368925 - time (sec): 414.45 - samples/sec: 443.20 - lr: 0.000005 - momentum: 0.000000
2023-10-27 17:36:42,804 epoch 2 - iter 3720/3726 - loss 0.08396268 - time (sec): 460.62 - samples/sec: 443.41 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:36:43,519 ----------------------------------------------------------------------------------------------------
2023-10-27 17:36:43,519 EPOCH 2 done: loss 0.0838 - lr: 0.000004
2023-10-27 17:37:07,571 DEV : loss 0.06706252694129944 - f1-score (micro avg)  0.9574
2023-10-27 17:37:07,624 saving best model
2023-10-27 17:37:10,584 ----------------------------------------------------------------------------------------------------
2023-10-27 17:37:57,573 epoch 3 - iter 372/3726 - loss 0.04814695 - time (sec): 46.99 - samples/sec: 438.53 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:38:44,337 epoch 3 - iter 744/3726 - loss 0.04886821 - time (sec): 93.75 - samples/sec: 438.14 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:39:31,726 epoch 3 - iter 1116/3726 - loss 0.05014060 - time (sec): 141.14 - samples/sec: 435.62 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:40:18,792 epoch 3 - iter 1488/3726 - loss 0.05220008 - time (sec): 188.21 - samples/sec: 437.54 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:41:05,149 epoch 3 - iter 1860/3726 - loss 0.05148240 - time (sec): 234.56 - samples/sec: 437.23 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:41:52,025 epoch 3 - iter 2232/3726 - loss 0.05339505 - time (sec): 281.44 - samples/sec: 437.28 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:42:38,711 epoch 3 - iter 2604/3726 - loss 0.05374593 - time (sec): 328.12 - samples/sec: 438.62 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:43:25,619 epoch 3 - iter 2976/3726 - loss 0.05287703 - time (sec): 375.03 - samples/sec: 437.97 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:44:13,432 epoch 3 - iter 3348/3726 - loss 0.05256041 - time (sec): 422.85 - samples/sec: 435.76 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:45:00,348 epoch 3 - iter 3720/3726 - loss 0.05257701 - time (sec): 469.76 - samples/sec: 434.99 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:45:01,081 ----------------------------------------------------------------------------------------------------
2023-10-27 17:45:01,082 EPOCH 3 done: loss 0.0526 - lr: 0.000004
2023-10-27 17:45:25,642 DEV : loss 0.04900110512971878 - f1-score (micro avg)  0.9632
2023-10-27 17:45:25,698 saving best model
2023-10-27 17:45:28,617 ----------------------------------------------------------------------------------------------------
2023-10-27 17:46:15,914 epoch 4 - iter 372/3726 - loss 0.03649343 - time (sec): 47.29 - samples/sec: 421.21 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:47:02,348 epoch 4 - iter 744/3726 - loss 0.03904655 - time (sec): 93.73 - samples/sec: 428.89 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:47:50,016 epoch 4 - iter 1116/3726 - loss 0.03747173 - time (sec): 141.40 - samples/sec: 431.76 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:48:37,609 epoch 4 - iter 1488/3726 - loss 0.03962095 - time (sec): 188.99 - samples/sec: 432.03 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:49:24,478 epoch 4 - iter 1860/3726 - loss 0.03665861 - time (sec): 235.86 - samples/sec: 435.26 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:50:10,615 epoch 4 - iter 2232/3726 - loss 0.03744683 - time (sec): 282.00 - samples/sec: 436.00 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:50:56,634 epoch 4 - iter 2604/3726 - loss 0.03718038 - time (sec): 328.01 - samples/sec: 438.31 - lr: 0.000004 - momentum: 0.000000
2023-10-27 17:51:41,898 epoch 4 - iter 2976/3726 - loss 0.03558423 - time (sec): 373.28 - samples/sec: 440.32 - lr: 0.000003 - momentum: 0.000000
2023-10-27 17:52:27,923 epoch 4 - iter 3348/3726 - loss 0.03562024 - time (sec): 419.30 - samples/sec: 439.49 - lr: 0.000003 - momentum: 0.000000
2023-10-27 17:53:14,720 epoch 4 - iter 3720/3726 - loss 0.03527266 - time (sec): 466.10 - samples/sec: 438.45 - lr: 0.000003 - momentum: 0.000000
2023-10-27 17:53:15,487 ----------------------------------------------------------------------------------------------------
2023-10-27 17:53:15,487 EPOCH 4 done: loss 0.0353 - lr: 0.000003
2023-10-27 17:53:38,247 DEV : loss 0.05077873915433884 - f1-score (micro avg)  0.9689
2023-10-27 17:53:38,300 saving best model
2023-10-27 17:53:41,578 ----------------------------------------------------------------------------------------------------
2023-10-27 17:54:28,406 epoch 5 - iter 372/3726 - loss 0.01635597 - time (sec): 46.83 - samples/sec: 431.15 - lr: 0.000003 - momentum: 0.000000
2023-10-27 17:55:14,040 epoch 5 - iter 744/3726 - loss 0.01995793 - time (sec): 92.46 - samples/sec: 438.55 - lr: 0.000003 - momentum: 0.000000
2023-10-27 17:56:00,008 epoch 5 - iter 1116/3726 - loss 0.02271135 - time (sec): 138.43 - samples/sec: 439.48 - lr: 0.000003 - momentum: 0.000000
2023-10-27 17:56:46,029 epoch 5 - iter 1488/3726 - loss 0.02370028 - time (sec): 184.45 - samples/sec: 439.76 - lr: 0.000003 - momentum: 0.000000
2023-10-27 17:57:32,057 epoch 5 - iter 1860/3726 - loss 0.02496095 - time (sec): 230.48 - samples/sec: 437.98 - lr: 0.000003 - momentum: 0.000000
2023-10-27 17:58:18,548 epoch 5 - iter 2232/3726 - loss 0.02420606 - time (sec): 276.97 - samples/sec: 436.18 - lr: 0.000003 - momentum: 0.000000
2023-10-27 17:59:05,818 epoch 5 - iter 2604/3726 - loss 0.02385058 - time (sec): 324.24 - samples/sec: 438.78 - lr: 0.000003 - momentum: 0.000000
2023-10-27 17:59:52,270 epoch 5 - iter 2976/3726 - loss 0.02471771 - time (sec): 370.69 - samples/sec: 439.50 - lr: 0.000003 - momentum: 0.000000
2023-10-27 18:00:39,106 epoch 5 - iter 3348/3726 - loss 0.02672304 - time (sec): 417.53 - samples/sec: 440.28 - lr: 0.000003 - momentum: 0.000000
2023-10-27 18:01:25,971 epoch 5 - iter 3720/3726 - loss 0.02642411 - time (sec): 464.39 - samples/sec: 439.82 - lr: 0.000003 - momentum: 0.000000
2023-10-27 18:01:26,740 ----------------------------------------------------------------------------------------------------
2023-10-27 18:01:26,740 EPOCH 5 done: loss 0.0264 - lr: 0.000003
2023-10-27 18:01:50,293 DEV : loss 0.05235698074102402 - f1-score (micro avg)  0.972
2023-10-27 18:01:50,346 saving best model
2023-10-27 18:01:53,254 ----------------------------------------------------------------------------------------------------
2023-10-27 18:02:39,752 epoch 6 - iter 372/3726 - loss 0.02916463 - time (sec): 46.49 - samples/sec: 446.96 - lr: 0.000003 - momentum: 0.000000
2023-10-27 18:03:25,626 epoch 6 - iter 744/3726 - loss 0.02452630 - time (sec): 92.37 - samples/sec: 442.98 - lr: 0.000003 - momentum: 0.000000
2023-10-27 18:04:11,837 epoch 6 - iter 1116/3726 - loss 0.02460461 - time (sec): 138.58 - samples/sec: 443.92 - lr: 0.000003 - momentum: 0.000000
2023-10-27 18:04:59,229 epoch 6 - iter 1488/3726 - loss 0.02344474 - time (sec): 185.97 - samples/sec: 441.02 - lr: 0.000003 - momentum: 0.000000
2023-10-27 18:05:46,701 epoch 6 - iter 1860/3726 - loss 0.02371111 - time (sec): 233.44 - samples/sec: 438.96 - lr: 0.000003 - momentum: 0.000000
2023-10-27 18:06:33,625 epoch 6 - iter 2232/3726 - loss 0.02288733 - time (sec): 280.37 - samples/sec: 438.13 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:07:19,620 epoch 6 - iter 2604/3726 - loss 0.02107152 - time (sec): 326.36 - samples/sec: 438.12 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:08:06,623 epoch 6 - iter 2976/3726 - loss 0.02064455 - time (sec): 373.36 - samples/sec: 437.04 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:08:52,249 epoch 6 - iter 3348/3726 - loss 0.02103691 - time (sec): 418.99 - samples/sec: 438.95 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:09:38,374 epoch 6 - iter 3720/3726 - loss 0.02102916 - time (sec): 465.12 - samples/sec: 439.25 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:09:39,127 ----------------------------------------------------------------------------------------------------
2023-10-27 18:09:39,128 EPOCH 6 done: loss 0.0211 - lr: 0.000002
2023-10-27 18:10:02,947 DEV : loss 0.05808666720986366 - f1-score (micro avg)  0.9682
2023-10-27 18:10:03,000 ----------------------------------------------------------------------------------------------------
2023-10-27 18:10:49,637 epoch 7 - iter 372/3726 - loss 0.01575730 - time (sec): 46.63 - samples/sec: 435.88 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:11:35,967 epoch 7 - iter 744/3726 - loss 0.01398724 - time (sec): 92.96 - samples/sec: 438.00 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:12:22,831 epoch 7 - iter 1116/3726 - loss 0.01313005 - time (sec): 139.83 - samples/sec: 442.95 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:13:08,773 epoch 7 - iter 1488/3726 - loss 0.01291165 - time (sec): 185.77 - samples/sec: 443.62 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:13:54,815 epoch 7 - iter 1860/3726 - loss 0.01295979 - time (sec): 231.81 - samples/sec: 441.98 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:14:40,721 epoch 7 - iter 2232/3726 - loss 0.01255139 - time (sec): 277.72 - samples/sec: 442.13 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:15:28,159 epoch 7 - iter 2604/3726 - loss 0.01200459 - time (sec): 325.16 - samples/sec: 439.39 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:16:14,038 epoch 7 - iter 2976/3726 - loss 0.01248980 - time (sec): 371.04 - samples/sec: 440.26 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:16:59,656 epoch 7 - iter 3348/3726 - loss 0.01321463 - time (sec): 416.65 - samples/sec: 441.16 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:17:45,100 epoch 7 - iter 3720/3726 - loss 0.01382182 - time (sec): 462.10 - samples/sec: 442.14 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:17:45,799 ----------------------------------------------------------------------------------------------------
2023-10-27 18:17:45,800 EPOCH 7 done: loss 0.0138 - lr: 0.000002
2023-10-27 18:18:08,992 DEV : loss 0.058880679309368134 - f1-score (micro avg)  0.9703
2023-10-27 18:18:09,048 ----------------------------------------------------------------------------------------------------
2023-10-27 18:18:55,063 epoch 8 - iter 372/3726 - loss 0.01490756 - time (sec): 46.01 - samples/sec: 448.34 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:19:41,086 epoch 8 - iter 744/3726 - loss 0.01079045 - time (sec): 92.03 - samples/sec: 439.19 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:20:27,084 epoch 8 - iter 1116/3726 - loss 0.01191125 - time (sec): 138.03 - samples/sec: 443.50 - lr: 0.000002 - momentum: 0.000000
2023-10-27 18:21:12,462 epoch 8 - iter 1488/3726 - loss 0.01100526 - time (sec): 183.41 - samples/sec: 451.17 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:21:58,283 epoch 8 - iter 1860/3726 - loss 0.01185326 - time (sec): 229.23 - samples/sec: 448.88 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:22:44,335 epoch 8 - iter 2232/3726 - loss 0.01176460 - time (sec): 275.28 - samples/sec: 445.95 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:23:30,785 epoch 8 - iter 2604/3726 - loss 0.01194894 - time (sec): 321.73 - samples/sec: 442.88 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:24:17,599 epoch 8 - iter 2976/3726 - loss 0.01190633 - time (sec): 368.55 - samples/sec: 441.32 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:25:04,773 epoch 8 - iter 3348/3726 - loss 0.01189745 - time (sec): 415.72 - samples/sec: 441.59 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:25:51,526 epoch 8 - iter 3720/3726 - loss 0.01191728 - time (sec): 462.48 - samples/sec: 441.51 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:25:52,333 ----------------------------------------------------------------------------------------------------
2023-10-27 18:25:52,333 EPOCH 8 done: loss 0.0120 - lr: 0.000001
2023-10-27 18:26:16,427 DEV : loss 0.05278489366173744 - f1-score (micro avg)  0.9741
2023-10-27 18:26:16,480 saving best model
2023-10-27 18:26:19,470 ----------------------------------------------------------------------------------------------------
2023-10-27 18:27:05,544 epoch 9 - iter 372/3726 - loss 0.00965068 - time (sec): 46.07 - samples/sec: 441.62 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:27:51,054 epoch 9 - iter 744/3726 - loss 0.00782923 - time (sec): 91.58 - samples/sec: 445.46 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:28:36,410 epoch 9 - iter 1116/3726 - loss 0.00714565 - time (sec): 136.94 - samples/sec: 450.28 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:29:22,141 epoch 9 - iter 1488/3726 - loss 0.00788002 - time (sec): 182.67 - samples/sec: 451.72 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:30:08,759 epoch 9 - iter 1860/3726 - loss 0.00817722 - time (sec): 229.29 - samples/sec: 448.24 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:30:54,418 epoch 9 - iter 2232/3726 - loss 0.00812347 - time (sec): 274.95 - samples/sec: 446.63 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:31:40,019 epoch 9 - iter 2604/3726 - loss 0.00818017 - time (sec): 320.55 - samples/sec: 446.97 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:32:25,812 epoch 9 - iter 2976/3726 - loss 0.00800987 - time (sec): 366.34 - samples/sec: 445.32 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:33:11,863 epoch 9 - iter 3348/3726 - loss 0.00815904 - time (sec): 412.39 - samples/sec: 445.22 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:33:57,717 epoch 9 - iter 3720/3726 - loss 0.00780421 - time (sec): 458.24 - samples/sec: 445.83 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:33:58,426 ----------------------------------------------------------------------------------------------------
2023-10-27 18:33:58,426 EPOCH 9 done: loss 0.0078 - lr: 0.000001
2023-10-27 18:34:21,819 DEV : loss 0.05219843238592148 - f1-score (micro avg)  0.9766
2023-10-27 18:34:21,872 saving best model
2023-10-27 18:34:24,714 ----------------------------------------------------------------------------------------------------
2023-10-27 18:35:10,773 epoch 10 - iter 372/3726 - loss 0.00735364 - time (sec): 46.06 - samples/sec: 439.89 - lr: 0.000001 - momentum: 0.000000
2023-10-27 18:35:55,957 epoch 10 - iter 744/3726 - loss 0.00724933 - time (sec): 91.24 - samples/sec: 448.71 - lr: 0.000000 - momentum: 0.000000
2023-10-27 18:36:41,613 epoch 10 - iter 1116/3726 - loss 0.00540230 - time (sec): 136.90 - samples/sec: 451.22 - lr: 0.000000 - momentum: 0.000000
2023-10-27 18:37:26,965 epoch 10 - iter 1488/3726 - loss 0.00604023 - time (sec): 182.25 - samples/sec: 453.12 - lr: 0.000000 - momentum: 0.000000
2023-10-27 18:38:12,558 epoch 10 - iter 1860/3726 - loss 0.00646998 - time (sec): 227.84 - samples/sec: 452.95 - lr: 0.000000 - momentum: 0.000000
2023-10-27 18:38:58,762 epoch 10 - iter 2232/3726 - loss 0.00621628 - time (sec): 274.05 - samples/sec: 449.84 - lr: 0.000000 - momentum: 0.000000
2023-10-27 18:39:45,050 epoch 10 - iter 2604/3726 - loss 0.00611127 - time (sec): 320.33 - samples/sec: 447.82 - lr: 0.000000 - momentum: 0.000000
2023-10-27 18:40:30,686 epoch 10 - iter 2976/3726 - loss 0.00609183 - time (sec): 365.97 - samples/sec: 447.92 - lr: 0.000000 - momentum: 0.000000
2023-10-27 18:41:16,049 epoch 10 - iter 3348/3726 - loss 0.00601238 - time (sec): 411.33 - samples/sec: 447.77 - lr: 0.000000 - momentum: 0.000000
2023-10-27 18:42:01,610 epoch 10 - iter 3720/3726 - loss 0.00601111 - time (sec): 456.89 - samples/sec: 447.20 - lr: 0.000000 - momentum: 0.000000
2023-10-27 18:42:02,342 ----------------------------------------------------------------------------------------------------
2023-10-27 18:42:02,343 EPOCH 10 done: loss 0.0060 - lr: 0.000000
2023-10-27 18:42:25,659 DEV : loss 0.05048835650086403 - f1-score (micro avg)  0.9762
2023-10-27 18:42:28,164 ----------------------------------------------------------------------------------------------------
2023-10-27 18:42:28,165 Loading model from best epoch ...
2023-10-27 18:42:36,107 SequenceTagger predicts: Dictionary with 17 tags: O, S-ORG, B-ORG, E-ORG, I-ORG, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-MISC, B-MISC, E-MISC, I-MISC
2023-10-27 18:42:59,035 
Results:
- F-score (micro) 0.9702
- F-score (macro) 0.9649
- Accuracy 0.956

By class:
              precision    recall  f1-score   support

         ORG     0.9623    0.9749    0.9685      1909
         PER     0.9962    0.9950    0.9956      1591
         LOC     0.9729    0.9660    0.9695      1413
        MISC     0.9178    0.9347    0.9262       812

   micro avg     0.9678    0.9726    0.9702      5725
   macro avg     0.9623    0.9676    0.9649      5725
weighted avg     0.9680    0.9726    0.9703      5725

2023-10-27 18:42:59,036 ----------------------------------------------------------------------------------------------------