File size: 24,309 Bytes
3072fde
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
2023-11-16 08:54:28,192 ----------------------------------------------------------------------------------------------------
2023-11-16 08:54:28,194 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): XLMRobertaModel(
      (embeddings): XLMRobertaEmbeddings(
        (word_embeddings): Embedding(250003, 1024)
        (position_embeddings): Embedding(514, 1024, padding_idx=1)
        (token_type_embeddings): Embedding(1, 1024)
        (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): XLMRobertaEncoder(
        (layer): ModuleList(
          (0-23): 24 x XLMRobertaLayer(
            (attention): XLMRobertaAttention(
              (self): XLMRobertaSelfAttention(
                (query): Linear(in_features=1024, out_features=1024, bias=True)
                (key): Linear(in_features=1024, out_features=1024, bias=True)
                (value): Linear(in_features=1024, out_features=1024, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): XLMRobertaSelfOutput(
                (dense): Linear(in_features=1024, out_features=1024, bias=True)
                (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): XLMRobertaIntermediate(
              (dense): Linear(in_features=1024, out_features=4096, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): XLMRobertaOutput(
              (dense): Linear(in_features=4096, out_features=1024, bias=True)
              (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): XLMRobertaPooler(
        (dense): Linear(in_features=1024, out_features=1024, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=1024, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-11-16 08:54:28,194 ----------------------------------------------------------------------------------------------------
2023-11-16 08:54:28,194 MultiCorpus: 30000 train + 10000 dev + 10000 test sentences
 - ColumnCorpus Corpus: 20000 train + 0 dev + 0 test sentences - /root/.flair/datasets/ner_multi_xtreme/en
 - ColumnCorpus Corpus: 10000 train + 10000 dev + 10000 test sentences - /root/.flair/datasets/ner_multi_xtreme/ka
2023-11-16 08:54:28,194 ----------------------------------------------------------------------------------------------------
2023-11-16 08:54:28,194 Train:  30000 sentences
2023-11-16 08:54:28,194         (train_with_dev=False, train_with_test=False)
2023-11-16 08:54:28,194 ----------------------------------------------------------------------------------------------------
2023-11-16 08:54:28,194 Training Params:
2023-11-16 08:54:28,194  - learning_rate: "5e-06" 
2023-11-16 08:54:28,194  - mini_batch_size: "4"
2023-11-16 08:54:28,194  - max_epochs: "10"
2023-11-16 08:54:28,194  - shuffle: "True"
2023-11-16 08:54:28,194 ----------------------------------------------------------------------------------------------------
2023-11-16 08:54:28,194 Plugins:
2023-11-16 08:54:28,194  - TensorboardLogger
2023-11-16 08:54:28,194  - LinearScheduler | warmup_fraction: '0.1'
2023-11-16 08:54:28,194 ----------------------------------------------------------------------------------------------------
2023-11-16 08:54:28,194 Final evaluation on model from best epoch (best-model.pt)
2023-11-16 08:54:28,194  - metric: "('micro avg', 'f1-score')"
2023-11-16 08:54:28,194 ----------------------------------------------------------------------------------------------------
2023-11-16 08:54:28,195 Computation:
2023-11-16 08:54:28,195  - compute on device: cuda:0
2023-11-16 08:54:28,195  - embedding storage: none
2023-11-16 08:54:28,195 ----------------------------------------------------------------------------------------------------
2023-11-16 08:54:28,195 Model training base path: "autotrain-flair-georgian-ner-xlm_r_large-bs4-e10-lr5e-06-5"
2023-11-16 08:54:28,195 ----------------------------------------------------------------------------------------------------
2023-11-16 08:54:28,195 ----------------------------------------------------------------------------------------------------
2023-11-16 08:54:28,195 Logging anything other than scalars to TensorBoard is currently not supported.
2023-11-16 08:56:03,042 epoch 1 - iter 750/7500 - loss 2.73514177 - time (sec): 94.85 - samples/sec: 251.28 - lr: 0.000000 - momentum: 0.000000
2023-11-16 08:57:34,557 epoch 1 - iter 1500/7500 - loss 2.27024326 - time (sec): 186.36 - samples/sec: 258.14 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:59:07,587 epoch 1 - iter 2250/7500 - loss 1.96361468 - time (sec): 279.39 - samples/sec: 259.21 - lr: 0.000001 - momentum: 0.000000
2023-11-16 09:00:41,392 epoch 1 - iter 3000/7500 - loss 1.71842298 - time (sec): 373.20 - samples/sec: 259.19 - lr: 0.000002 - momentum: 0.000000
2023-11-16 09:02:13,057 epoch 1 - iter 3750/7500 - loss 1.50867437 - time (sec): 464.86 - samples/sec: 260.24 - lr: 0.000002 - momentum: 0.000000
2023-11-16 09:03:44,283 epoch 1 - iter 4500/7500 - loss 1.34975880 - time (sec): 556.09 - samples/sec: 261.24 - lr: 0.000003 - momentum: 0.000000
2023-11-16 09:05:17,789 epoch 1 - iter 5250/7500 - loss 1.23153815 - time (sec): 649.59 - samples/sec: 261.16 - lr: 0.000003 - momentum: 0.000000
2023-11-16 09:06:51,967 epoch 1 - iter 6000/7500 - loss 1.13901611 - time (sec): 743.77 - samples/sec: 260.11 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:08:25,989 epoch 1 - iter 6750/7500 - loss 1.06518907 - time (sec): 837.79 - samples/sec: 259.14 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:10:02,189 epoch 1 - iter 7500/7500 - loss 1.00335008 - time (sec): 933.99 - samples/sec: 257.81 - lr: 0.000005 - momentum: 0.000000
2023-11-16 09:10:02,191 ----------------------------------------------------------------------------------------------------
2023-11-16 09:10:02,191 EPOCH 1 done: loss 1.0034 - lr: 0.000005
2023-11-16 09:10:29,151 DEV : loss 0.25209933519363403 - f1-score (micro avg)  0.818
2023-11-16 09:10:30,799 saving best model
2023-11-16 09:10:32,550 ----------------------------------------------------------------------------------------------------
2023-11-16 09:12:05,182 epoch 2 - iter 750/7500 - loss 0.40758191 - time (sec): 92.63 - samples/sec: 255.89 - lr: 0.000005 - momentum: 0.000000
2023-11-16 09:13:40,646 epoch 2 - iter 1500/7500 - loss 0.41354608 - time (sec): 188.09 - samples/sec: 252.51 - lr: 0.000005 - momentum: 0.000000
2023-11-16 09:15:13,127 epoch 2 - iter 2250/7500 - loss 0.41469889 - time (sec): 280.57 - samples/sec: 256.53 - lr: 0.000005 - momentum: 0.000000
2023-11-16 09:16:46,617 epoch 2 - iter 3000/7500 - loss 0.40864404 - time (sec): 374.06 - samples/sec: 257.65 - lr: 0.000005 - momentum: 0.000000
2023-11-16 09:18:19,301 epoch 2 - iter 3750/7500 - loss 0.40728475 - time (sec): 466.75 - samples/sec: 258.79 - lr: 0.000005 - momentum: 0.000000
2023-11-16 09:19:50,423 epoch 2 - iter 4500/7500 - loss 0.40431483 - time (sec): 557.87 - samples/sec: 259.91 - lr: 0.000005 - momentum: 0.000000
2023-11-16 09:21:22,320 epoch 2 - iter 5250/7500 - loss 0.40010819 - time (sec): 649.77 - samples/sec: 260.46 - lr: 0.000005 - momentum: 0.000000
2023-11-16 09:22:53,693 epoch 2 - iter 6000/7500 - loss 0.39994412 - time (sec): 741.14 - samples/sec: 260.74 - lr: 0.000005 - momentum: 0.000000
2023-11-16 09:24:27,071 epoch 2 - iter 6750/7500 - loss 0.40164576 - time (sec): 834.52 - samples/sec: 260.23 - lr: 0.000005 - momentum: 0.000000
2023-11-16 09:26:00,518 epoch 2 - iter 7500/7500 - loss 0.40139034 - time (sec): 927.97 - samples/sec: 259.49 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:26:00,523 ----------------------------------------------------------------------------------------------------
2023-11-16 09:26:00,523 EPOCH 2 done: loss 0.4014 - lr: 0.000004
2023-11-16 09:26:28,590 DEV : loss 0.27011075615882874 - f1-score (micro avg)  0.8685
2023-11-16 09:26:30,477 saving best model
2023-11-16 09:26:32,489 ----------------------------------------------------------------------------------------------------
2023-11-16 09:28:06,865 epoch 3 - iter 750/7500 - loss 0.34664813 - time (sec): 94.37 - samples/sec: 259.00 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:29:40,525 epoch 3 - iter 1500/7500 - loss 0.35802918 - time (sec): 188.03 - samples/sec: 256.41 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:31:13,248 epoch 3 - iter 2250/7500 - loss 0.34949160 - time (sec): 280.76 - samples/sec: 259.11 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:32:44,431 epoch 3 - iter 3000/7500 - loss 0.34400653 - time (sec): 371.94 - samples/sec: 261.74 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:34:18,607 epoch 3 - iter 3750/7500 - loss 0.34736997 - time (sec): 466.12 - samples/sec: 259.21 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:35:51,979 epoch 3 - iter 4500/7500 - loss 0.34780959 - time (sec): 559.49 - samples/sec: 258.64 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:37:25,102 epoch 3 - iter 5250/7500 - loss 0.34688026 - time (sec): 652.61 - samples/sec: 258.12 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:38:58,439 epoch 3 - iter 6000/7500 - loss 0.34475842 - time (sec): 745.95 - samples/sec: 257.81 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:40:34,113 epoch 3 - iter 6750/7500 - loss 0.34349763 - time (sec): 841.62 - samples/sec: 256.89 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:42:10,676 epoch 3 - iter 7500/7500 - loss 0.34117216 - time (sec): 938.18 - samples/sec: 256.66 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:42:10,679 ----------------------------------------------------------------------------------------------------
2023-11-16 09:42:10,679 EPOCH 3 done: loss 0.3412 - lr: 0.000004
2023-11-16 09:42:37,676 DEV : loss 0.2926769554615021 - f1-score (micro avg)  0.8843
2023-11-16 09:42:39,581 saving best model
2023-11-16 09:42:41,582 ----------------------------------------------------------------------------------------------------
2023-11-16 09:44:14,986 epoch 4 - iter 750/7500 - loss 0.28935096 - time (sec): 93.40 - samples/sec: 255.70 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:45:47,989 epoch 4 - iter 1500/7500 - loss 0.29093752 - time (sec): 186.40 - samples/sec: 257.90 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:47:19,359 epoch 4 - iter 2250/7500 - loss 0.29888625 - time (sec): 277.77 - samples/sec: 259.38 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:48:51,599 epoch 4 - iter 3000/7500 - loss 0.30114670 - time (sec): 370.01 - samples/sec: 258.77 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:50:23,345 epoch 4 - iter 3750/7500 - loss 0.30007515 - time (sec): 461.76 - samples/sec: 259.19 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:51:56,749 epoch 4 - iter 4500/7500 - loss 0.29780444 - time (sec): 555.16 - samples/sec: 259.74 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:53:29,783 epoch 4 - iter 5250/7500 - loss 0.29632423 - time (sec): 648.20 - samples/sec: 260.29 - lr: 0.000004 - momentum: 0.000000
2023-11-16 09:55:03,332 epoch 4 - iter 6000/7500 - loss 0.29715629 - time (sec): 741.75 - samples/sec: 259.64 - lr: 0.000003 - momentum: 0.000000
2023-11-16 09:56:39,271 epoch 4 - iter 6750/7500 - loss 0.29789543 - time (sec): 837.69 - samples/sec: 258.76 - lr: 0.000003 - momentum: 0.000000
2023-11-16 09:58:14,979 epoch 4 - iter 7500/7500 - loss 0.30080354 - time (sec): 933.39 - samples/sec: 257.98 - lr: 0.000003 - momentum: 0.000000
2023-11-16 09:58:14,982 ----------------------------------------------------------------------------------------------------
2023-11-16 09:58:14,982 EPOCH 4 done: loss 0.3008 - lr: 0.000003
2023-11-16 09:58:42,391 DEV : loss 0.24168777465820312 - f1-score (micro avg)  0.8958
2023-11-16 09:58:44,896 saving best model
2023-11-16 09:58:47,890 ----------------------------------------------------------------------------------------------------
2023-11-16 10:00:21,193 epoch 5 - iter 750/7500 - loss 0.24325762 - time (sec): 93.30 - samples/sec: 254.74 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:01:54,947 epoch 5 - iter 1500/7500 - loss 0.24699916 - time (sec): 187.05 - samples/sec: 256.42 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:03:33,373 epoch 5 - iter 2250/7500 - loss 0.24105182 - time (sec): 285.48 - samples/sec: 251.76 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:05:10,591 epoch 5 - iter 3000/7500 - loss 0.24548635 - time (sec): 382.70 - samples/sec: 250.75 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:06:45,160 epoch 5 - iter 3750/7500 - loss 0.24697996 - time (sec): 477.27 - samples/sec: 252.00 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:08:18,840 epoch 5 - iter 4500/7500 - loss 0.24902921 - time (sec): 570.95 - samples/sec: 252.17 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:09:54,495 epoch 5 - iter 5250/7500 - loss 0.24900570 - time (sec): 666.60 - samples/sec: 251.98 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:11:28,132 epoch 5 - iter 6000/7500 - loss 0.25246330 - time (sec): 760.24 - samples/sec: 253.26 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:13:02,532 epoch 5 - iter 6750/7500 - loss 0.25090384 - time (sec): 854.64 - samples/sec: 253.77 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:14:33,980 epoch 5 - iter 7500/7500 - loss 0.25096782 - time (sec): 946.09 - samples/sec: 254.52 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:14:33,983 ----------------------------------------------------------------------------------------------------
2023-11-16 10:14:33,983 EPOCH 5 done: loss 0.2510 - lr: 0.000003
2023-11-16 10:15:01,770 DEV : loss 0.30133897066116333 - f1-score (micro avg)  0.8909
2023-11-16 10:15:04,505 ----------------------------------------------------------------------------------------------------
2023-11-16 10:16:43,786 epoch 6 - iter 750/7500 - loss 0.21043668 - time (sec): 99.28 - samples/sec: 246.62 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:18:21,438 epoch 6 - iter 1500/7500 - loss 0.21551137 - time (sec): 196.93 - samples/sec: 244.08 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:19:58,967 epoch 6 - iter 2250/7500 - loss 0.21671764 - time (sec): 294.46 - samples/sec: 244.90 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:21:36,357 epoch 6 - iter 3000/7500 - loss 0.21410789 - time (sec): 391.85 - samples/sec: 245.03 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:23:11,820 epoch 6 - iter 3750/7500 - loss 0.21190957 - time (sec): 487.31 - samples/sec: 246.63 - lr: 0.000003 - momentum: 0.000000
2023-11-16 10:24:44,519 epoch 6 - iter 4500/7500 - loss 0.21724671 - time (sec): 580.01 - samples/sec: 248.11 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:26:19,240 epoch 6 - iter 5250/7500 - loss 0.21517436 - time (sec): 674.73 - samples/sec: 249.52 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:27:52,483 epoch 6 - iter 6000/7500 - loss 0.21502133 - time (sec): 767.97 - samples/sec: 250.59 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:29:26,618 epoch 6 - iter 6750/7500 - loss 0.21284089 - time (sec): 862.11 - samples/sec: 251.34 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:31:02,401 epoch 6 - iter 7500/7500 - loss 0.21376183 - time (sec): 957.89 - samples/sec: 251.38 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:31:02,403 ----------------------------------------------------------------------------------------------------
2023-11-16 10:31:02,404 EPOCH 6 done: loss 0.2138 - lr: 0.000002
2023-11-16 10:31:29,893 DEV : loss 0.2858603894710541 - f1-score (micro avg)  0.9013
2023-11-16 10:31:32,354 saving best model
2023-11-16 10:31:34,717 ----------------------------------------------------------------------------------------------------
2023-11-16 10:33:10,947 epoch 7 - iter 750/7500 - loss 0.18560997 - time (sec): 96.22 - samples/sec: 252.05 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:34:45,274 epoch 7 - iter 1500/7500 - loss 0.18388762 - time (sec): 190.55 - samples/sec: 254.81 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:36:20,608 epoch 7 - iter 2250/7500 - loss 0.17300910 - time (sec): 285.89 - samples/sec: 254.67 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:37:56,122 epoch 7 - iter 3000/7500 - loss 0.18185895 - time (sec): 381.40 - samples/sec: 253.44 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:39:29,201 epoch 7 - iter 3750/7500 - loss 0.18240739 - time (sec): 474.48 - samples/sec: 253.00 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:41:03,810 epoch 7 - iter 4500/7500 - loss 0.18167213 - time (sec): 569.09 - samples/sec: 253.57 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:42:34,871 epoch 7 - iter 5250/7500 - loss 0.18305956 - time (sec): 660.15 - samples/sec: 255.17 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:44:06,443 epoch 7 - iter 6000/7500 - loss 0.18397991 - time (sec): 751.72 - samples/sec: 256.24 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:45:40,244 epoch 7 - iter 6750/7500 - loss 0.18284928 - time (sec): 845.52 - samples/sec: 256.33 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:47:13,133 epoch 7 - iter 7500/7500 - loss 0.18356346 - time (sec): 938.41 - samples/sec: 256.60 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:47:13,135 ----------------------------------------------------------------------------------------------------
2023-11-16 10:47:13,136 EPOCH 7 done: loss 0.1836 - lr: 0.000002
2023-11-16 10:47:40,667 DEV : loss 0.3011305034160614 - f1-score (micro avg)  0.8987
2023-11-16 10:47:42,779 ----------------------------------------------------------------------------------------------------
2023-11-16 10:49:16,539 epoch 8 - iter 750/7500 - loss 0.14330999 - time (sec): 93.76 - samples/sec: 256.81 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:50:50,862 epoch 8 - iter 1500/7500 - loss 0.14160047 - time (sec): 188.08 - samples/sec: 252.69 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:52:23,414 epoch 8 - iter 2250/7500 - loss 0.14478770 - time (sec): 280.63 - samples/sec: 256.60 - lr: 0.000002 - momentum: 0.000000
2023-11-16 10:53:56,497 epoch 8 - iter 3000/7500 - loss 0.14930840 - time (sec): 373.71 - samples/sec: 258.22 - lr: 0.000001 - momentum: 0.000000
2023-11-16 10:55:28,192 epoch 8 - iter 3750/7500 - loss 0.15175926 - time (sec): 465.41 - samples/sec: 258.55 - lr: 0.000001 - momentum: 0.000000
2023-11-16 10:57:00,742 epoch 8 - iter 4500/7500 - loss 0.15482872 - time (sec): 557.96 - samples/sec: 259.33 - lr: 0.000001 - momentum: 0.000000
2023-11-16 10:58:32,426 epoch 8 - iter 5250/7500 - loss 0.15110368 - time (sec): 649.64 - samples/sec: 259.55 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:00:02,971 epoch 8 - iter 6000/7500 - loss 0.15064249 - time (sec): 740.19 - samples/sec: 260.02 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:01:35,563 epoch 8 - iter 6750/7500 - loss 0.15119609 - time (sec): 832.78 - samples/sec: 260.42 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:03:09,780 epoch 8 - iter 7500/7500 - loss 0.15276122 - time (sec): 927.00 - samples/sec: 259.76 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:03:09,783 ----------------------------------------------------------------------------------------------------
2023-11-16 11:03:09,783 EPOCH 8 done: loss 0.1528 - lr: 0.000001
2023-11-16 11:03:37,801 DEV : loss 0.31595587730407715 - f1-score (micro avg)  0.9048
2023-11-16 11:03:40,424 saving best model
2023-11-16 11:03:43,075 ----------------------------------------------------------------------------------------------------
2023-11-16 11:05:19,854 epoch 9 - iter 750/7500 - loss 0.12299891 - time (sec): 96.77 - samples/sec: 253.49 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:06:56,996 epoch 9 - iter 1500/7500 - loss 0.12314833 - time (sec): 193.92 - samples/sec: 248.21 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:08:31,883 epoch 9 - iter 2250/7500 - loss 0.13008764 - time (sec): 288.80 - samples/sec: 250.26 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:10:08,385 epoch 9 - iter 3000/7500 - loss 0.13080561 - time (sec): 385.30 - samples/sec: 250.44 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:11:42,064 epoch 9 - iter 3750/7500 - loss 0.13273049 - time (sec): 478.98 - samples/sec: 251.53 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:13:16,037 epoch 9 - iter 4500/7500 - loss 0.13360348 - time (sec): 572.96 - samples/sec: 251.77 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:14:49,720 epoch 9 - iter 5250/7500 - loss 0.13299160 - time (sec): 666.64 - samples/sec: 252.02 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:16:23,602 epoch 9 - iter 6000/7500 - loss 0.13133134 - time (sec): 760.52 - samples/sec: 252.84 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:17:56,438 epoch 9 - iter 6750/7500 - loss 0.13363917 - time (sec): 853.36 - samples/sec: 253.29 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:19:29,919 epoch 9 - iter 7500/7500 - loss 0.13122225 - time (sec): 946.84 - samples/sec: 254.32 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:19:29,921 ----------------------------------------------------------------------------------------------------
2023-11-16 11:19:29,921 EPOCH 9 done: loss 0.1312 - lr: 0.000001
2023-11-16 11:19:57,777 DEV : loss 0.33471381664276123 - f1-score (micro avg)  0.9024
2023-11-16 11:20:00,060 ----------------------------------------------------------------------------------------------------
2023-11-16 11:21:34,000 epoch 10 - iter 750/7500 - loss 0.11743559 - time (sec): 93.94 - samples/sec: 260.10 - lr: 0.000001 - momentum: 0.000000
2023-11-16 11:23:09,621 epoch 10 - iter 1500/7500 - loss 0.12274941 - time (sec): 189.56 - samples/sec: 254.16 - lr: 0.000000 - momentum: 0.000000
2023-11-16 11:24:43,426 epoch 10 - iter 2250/7500 - loss 0.11588386 - time (sec): 283.36 - samples/sec: 255.95 - lr: 0.000000 - momentum: 0.000000
2023-11-16 11:26:20,492 epoch 10 - iter 3000/7500 - loss 0.11238624 - time (sec): 380.43 - samples/sec: 255.77 - lr: 0.000000 - momentum: 0.000000
2023-11-16 11:27:52,662 epoch 10 - iter 3750/7500 - loss 0.11433172 - time (sec): 472.60 - samples/sec: 256.50 - lr: 0.000000 - momentum: 0.000000
2023-11-16 11:29:26,109 epoch 10 - iter 4500/7500 - loss 0.11525478 - time (sec): 566.05 - samples/sec: 256.49 - lr: 0.000000 - momentum: 0.000000
2023-11-16 11:30:58,606 epoch 10 - iter 5250/7500 - loss 0.11793983 - time (sec): 658.54 - samples/sec: 256.98 - lr: 0.000000 - momentum: 0.000000
2023-11-16 11:32:30,455 epoch 10 - iter 6000/7500 - loss 0.11586937 - time (sec): 750.39 - samples/sec: 257.62 - lr: 0.000000 - momentum: 0.000000
2023-11-16 11:34:03,956 epoch 10 - iter 6750/7500 - loss 0.11407474 - time (sec): 843.89 - samples/sec: 257.08 - lr: 0.000000 - momentum: 0.000000
2023-11-16 11:35:37,383 epoch 10 - iter 7500/7500 - loss 0.11399531 - time (sec): 937.32 - samples/sec: 256.90 - lr: 0.000000 - momentum: 0.000000
2023-11-16 11:35:37,386 ----------------------------------------------------------------------------------------------------
2023-11-16 11:35:37,386 EPOCH 10 done: loss 0.1140 - lr: 0.000000
2023-11-16 11:36:05,240 DEV : loss 0.3250023126602173 - f1-score (micro avg)  0.9045
2023-11-16 11:36:08,837 ----------------------------------------------------------------------------------------------------
2023-11-16 11:36:08,840 Loading model from best epoch ...
2023-11-16 11:36:16,806 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-PER, B-PER, E-PER, I-PER
2023-11-16 11:36:45,465 
Results:
- F-score (micro) 0.9042
- F-score (macro) 0.9031
- Accuracy 0.8544

By class:
              precision    recall  f1-score   support

         LOC     0.9076    0.9106    0.9091      5288
         PER     0.9288    0.9419    0.9353      3962
         ORG     0.8606    0.8695    0.8650      3807

   micro avg     0.9004    0.9081    0.9042     13057
   macro avg     0.8990    0.9073    0.9031     13057
weighted avg     0.9004    0.9081    0.9042     13057

2023-11-16 11:36:45,466 ----------------------------------------------------------------------------------------------------