pjox's picture
Uploaded the model
7ca3dda
2022-02-05 01:08:47,419 ----------------------------------------------------------------------------------------------------
2022-02-05 01:08:47,461 Model: "SequenceTagger(
(embeddings): TransformerWordEmbeddings(
(model): RobertaModel(
(embeddings): RobertaEmbeddings(
(word_embeddings): Embedding(32768, 768, padding_idx=1)
(position_embeddings): Embedding(514, 768, padding_idx=1)
(token_type_embeddings): Embedding(1, 768)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): RobertaEncoder(
(layer): ModuleList(
(0): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(3): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(4): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(5): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(6): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(7): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(8): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(9): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(10): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(11): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): RobertaPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
)
(word_dropout): WordDropout(p=0.05)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=768, out_features=18, bias=True)
(beta): 1.0
(weights): None
(weight_tensor) None
)"
2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
2022-02-05 01:08:47,466 Corpus: "Corpus: 126973 train + 7037 dev + 7090 test sentences"
2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
2022-02-05 01:08:47,466 Parameters:
2022-02-05 01:08:47,466 - learning_rate: "5e-05"
2022-02-05 01:08:47,466 - mini_batch_size: "16"
2022-02-05 01:08:47,466 - patience: "3"
2022-02-05 01:08:47,466 - anneal_factor: "0.5"
2022-02-05 01:08:47,466 - max_epochs: "10"
2022-02-05 01:08:47,466 - shuffle: "True"
2022-02-05 01:08:47,466 - train_with_dev: "False"
2022-02-05 01:08:47,466 - batch_growth_annealing: "False"
2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
2022-02-05 01:08:47,466 Model training base path: "resources/taggers/ner-dalembert-2ndtry"
2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
2022-02-05 01:08:47,466 Device: cuda:0
2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
2022-02-05 01:08:47,467 Embeddings storage mode: none
2022-02-05 01:08:47,469 ----------------------------------------------------------------------------------------------------
2022-02-05 01:15:08,771 epoch 1 - iter 793/7936 - loss 0.78007372 - samples/sec: 33.28 - lr: 0.000005
2022-02-05 01:22:45,940 epoch 1 - iter 1586/7936 - loss 0.41932043 - samples/sec: 27.76 - lr: 0.000010
2022-02-05 01:29:23,897 epoch 1 - iter 2379/7936 - loss 0.33514542 - samples/sec: 31.89 - lr: 0.000015
2022-02-05 01:35:24,915 epoch 1 - iter 3172/7936 - loss 0.30212998 - samples/sec: 35.15 - lr: 0.000020
2022-02-05 01:42:28,297 epoch 1 - iter 3965/7936 - loss 0.27341208 - samples/sec: 29.97 - lr: 0.000025
2022-02-05 01:49:23,543 epoch 1 - iter 4758/7936 - loss 0.25403588 - samples/sec: 30.56 - lr: 0.000030
2022-02-05 01:55:46,783 epoch 1 - iter 5551/7936 - loss 0.24241496 - samples/sec: 33.11 - lr: 0.000035
2022-02-05 02:01:45,654 epoch 1 - iter 6344/7936 - loss 0.23381719 - samples/sec: 35.36 - lr: 0.000040
2022-02-05 02:07:29,407 epoch 1 - iter 7137/7936 - loss 0.22586308 - samples/sec: 36.92 - lr: 0.000045
2022-02-05 02:13:54,603 epoch 1 - iter 7930/7936 - loss 0.21834611 - samples/sec: 32.94 - lr: 0.000050
2022-02-05 02:13:57,692 ----------------------------------------------------------------------------------------------------
2022-02-05 02:13:57,693 EPOCH 1 done: loss 0.2183 - lr 0.0000500
2022-02-05 02:16:47,190 DEV : loss 0.0355144739151001 - f1-score (micro avg) 0.8254
2022-02-05 02:16:47,244 BAD EPOCHS (no improvement): 4
2022-02-05 02:16:47,244 ----------------------------------------------------------------------------------------------------
2022-02-05 02:23:15,435 epoch 2 - iter 793/7936 - loss 0.14903310 - samples/sec: 32.69 - lr: 0.000049
2022-02-05 02:30:06,605 epoch 2 - iter 1586/7936 - loss 0.14777394 - samples/sec: 30.86 - lr: 0.000049
2022-02-05 02:36:48,570 epoch 2 - iter 2379/7936 - loss 0.14637300 - samples/sec: 31.57 - lr: 0.000048
2022-02-05 02:43:37,172 epoch 2 - iter 3172/7936 - loss 0.14491485 - samples/sec: 31.06 - lr: 0.000048
2022-02-05 02:50:13,040 epoch 2 - iter 3965/7936 - loss 0.14361996 - samples/sec: 32.06 - lr: 0.000047
2022-02-05 02:56:49,904 epoch 2 - iter 4758/7936 - loss 0.14232123 - samples/sec: 31.98 - lr: 0.000047
2022-02-05 03:03:34,383 epoch 2 - iter 5551/7936 - loss 0.14116820 - samples/sec: 31.38 - lr: 0.000046
2022-02-05 03:10:09,778 epoch 2 - iter 6344/7936 - loss 0.14001072 - samples/sec: 32.10 - lr: 0.000046
2022-02-05 03:16:43,847 epoch 2 - iter 7137/7936 - loss 0.13868572 - samples/sec: 32.20 - lr: 0.000045
2022-02-05 03:23:28,994 epoch 2 - iter 7930/7936 - loss 0.13731517 - samples/sec: 31.33 - lr: 0.000044
2022-02-05 03:23:31,622 ----------------------------------------------------------------------------------------------------
2022-02-05 03:23:31,623 EPOCH 2 done: loss 0.1373 - lr 0.0000444
2022-02-05 03:26:13,727 DEV : loss 0.015243684872984886 - f1-score (micro avg) 0.9132
2022-02-05 03:26:13,788 BAD EPOCHS (no improvement): 4
2022-02-05 03:26:13,806 ----------------------------------------------------------------------------------------------------
2022-02-05 03:32:57,765 epoch 3 - iter 793/7936 - loss 0.11924788 - samples/sec: 31.42 - lr: 0.000044
2022-02-05 03:39:33,229 epoch 3 - iter 1586/7936 - loss 0.11867811 - samples/sec: 32.09 - lr: 0.000043
2022-02-05 03:46:09,619 epoch 3 - iter 2379/7936 - loss 0.11819415 - samples/sec: 32.01 - lr: 0.000043
2022-02-05 03:52:49,510 epoch 3 - iter 3172/7936 - loss 0.11779082 - samples/sec: 31.74 - lr: 0.000042
2022-02-05 03:59:27,917 epoch 3 - iter 3965/7936 - loss 0.11691604 - samples/sec: 31.85 - lr: 0.000042
2022-02-05 04:06:01,365 epoch 3 - iter 4758/7936 - loss 0.11592267 - samples/sec: 32.26 - lr: 0.000041
2022-02-05 04:12:41,174 epoch 3 - iter 5551/7936 - loss 0.11480043 - samples/sec: 31.74 - lr: 0.000041
2022-02-05 04:19:14,243 epoch 3 - iter 6344/7936 - loss 0.11389582 - samples/sec: 32.29 - lr: 0.000040
2022-02-05 04:25:45,192 epoch 3 - iter 7137/7936 - loss 0.11289267 - samples/sec: 32.46 - lr: 0.000039
2022-02-05 04:32:26,310 epoch 3 - iter 7930/7936 - loss 0.11196899 - samples/sec: 31.64 - lr: 0.000039
2022-02-05 04:32:29,352 ----------------------------------------------------------------------------------------------------
2022-02-05 04:32:29,353 EPOCH 3 done: loss 0.1120 - lr 0.0000389
2022-02-05 04:35:09,639 DEV : loss 0.016585879027843475 - f1-score (micro avg) 0.9229
2022-02-05 04:35:09,698 BAD EPOCHS (no improvement): 4
2022-02-05 04:35:09,698 ----------------------------------------------------------------------------------------------------
2022-02-05 04:41:46,821 epoch 4 - iter 793/7936 - loss 0.09739851 - samples/sec: 31.96 - lr: 0.000038
2022-02-05 04:48:23,504 epoch 4 - iter 1586/7936 - loss 0.09750632 - samples/sec: 31.99 - lr: 0.000038
2022-02-05 04:55:05,833 epoch 4 - iter 2379/7936 - loss 0.09636659 - samples/sec: 31.54 - lr: 0.000037
2022-02-05 05:01:34,951 epoch 4 - iter 3172/7936 - loss 0.09583742 - samples/sec: 32.61 - lr: 0.000037
2022-02-05 05:08:07,163 epoch 4 - iter 3965/7936 - loss 0.09518243 - samples/sec: 32.36 - lr: 0.000036
2022-02-05 05:14:50,781 epoch 4 - iter 4758/7936 - loss 0.09444265 - samples/sec: 31.44 - lr: 0.000036
2022-02-05 05:21:24,983 epoch 4 - iter 5551/7936 - loss 0.09374740 - samples/sec: 32.19 - lr: 0.000035
2022-02-05 05:27:54,052 epoch 4 - iter 6344/7936 - loss 0.09321236 - samples/sec: 32.62 - lr: 0.000034
2022-02-05 05:34:32,228 epoch 4 - iter 7137/7936 - loss 0.09231997 - samples/sec: 31.87 - lr: 0.000034
2022-02-05 05:41:08,580 epoch 4 - iter 7930/7936 - loss 0.09147929 - samples/sec: 32.02 - lr: 0.000033
2022-02-05 05:41:11,479 ----------------------------------------------------------------------------------------------------
2022-02-05 05:41:11,479 EPOCH 4 done: loss 0.0915 - lr 0.0000333
2022-02-05 05:44:00,197 DEV : loss 0.016923826187849045 - f1-score (micro avg) 0.9213
2022-02-05 05:44:00,256 BAD EPOCHS (no improvement): 4
2022-02-05 05:44:00,270 ----------------------------------------------------------------------------------------------------
2022-02-05 05:50:27,537 epoch 5 - iter 793/7936 - loss 0.07986125 - samples/sec: 32.77 - lr: 0.000033
2022-02-05 05:56:56,203 epoch 5 - iter 1586/7936 - loss 0.08031745 - samples/sec: 32.65 - lr: 0.000032
2022-02-05 06:03:34,109 epoch 5 - iter 2379/7936 - loss 0.07984185 - samples/sec: 31.89 - lr: 0.000032
2022-02-05 06:10:03,550 epoch 5 - iter 3172/7936 - loss 0.07905074 - samples/sec: 32.59 - lr: 0.000031
2022-02-05 06:16:30,085 epoch 5 - iter 3965/7936 - loss 0.07843193 - samples/sec: 32.83 - lr: 0.000031
2022-02-05 06:23:10,671 epoch 5 - iter 4758/7936 - loss 0.07785540 - samples/sec: 31.68 - lr: 0.000030
2022-02-05 06:29:45,063 epoch 5 - iter 5551/7936 - loss 0.07709413 - samples/sec: 32.18 - lr: 0.000029
2022-02-05 06:36:23,513 epoch 5 - iter 6344/7936 - loss 0.07634510 - samples/sec: 31.85 - lr: 0.000029
2022-02-05 06:42:51,615 epoch 5 - iter 7137/7936 - loss 0.07566508 - samples/sec: 32.70 - lr: 0.000028
2022-02-05 06:49:23,409 epoch 5 - iter 7930/7936 - loss 0.07495508 - samples/sec: 32.39 - lr: 0.000028
2022-02-05 06:49:26,372 ----------------------------------------------------------------------------------------------------
2022-02-05 06:49:26,373 EPOCH 5 done: loss 0.0750 - lr 0.0000278
2022-02-05 06:52:15,459 DEV : loss 0.017464155331254005 - f1-score (micro avg) 0.9311
2022-02-05 06:52:15,518 BAD EPOCHS (no improvement): 4
2022-02-05 06:52:15,518 ----------------------------------------------------------------------------------------------------
2022-02-05 06:58:49,072 epoch 6 - iter 793/7936 - loss 0.06552824 - samples/sec: 32.25 - lr: 0.000027
2022-02-05 07:05:27,796 epoch 6 - iter 1586/7936 - loss 0.06569517 - samples/sec: 31.83 - lr: 0.000027
2022-02-05 07:11:58,162 epoch 6 - iter 2379/7936 - loss 0.06536467 - samples/sec: 32.51 - lr: 0.000026
2022-02-05 07:18:25,878 epoch 6 - iter 3172/7936 - loss 0.06467146 - samples/sec: 32.73 - lr: 0.000026
2022-02-05 07:25:10,562 epoch 6 - iter 3965/7936 - loss 0.06426965 - samples/sec: 31.36 - lr: 0.000025
2022-02-05 07:31:39,437 epoch 6 - iter 4758/7936 - loss 0.06371305 - samples/sec: 32.63 - lr: 0.000024
2022-02-05 07:38:08,323 epoch 6 - iter 5551/7936 - loss 0.06328229 - samples/sec: 32.63 - lr: 0.000024
2022-02-05 07:44:52,176 epoch 6 - iter 6344/7936 - loss 0.06272143 - samples/sec: 31.42 - lr: 0.000023
2022-02-05 07:51:20,507 epoch 6 - iter 7137/7936 - loss 0.06218937 - samples/sec: 32.68 - lr: 0.000023
2022-02-05 07:57:52,828 epoch 6 - iter 7930/7936 - loss 0.06175113 - samples/sec: 32.35 - lr: 0.000022
2022-02-05 07:57:55,686 ----------------------------------------------------------------------------------------------------
2022-02-05 07:57:55,687 EPOCH 6 done: loss 0.0617 - lr 0.0000222
2022-02-05 08:00:45,565 DEV : loss 0.01982131227850914 - f1-score (micro avg) 0.9358
2022-02-05 08:00:45,625 BAD EPOCHS (no improvement): 4
2022-02-05 08:00:45,644 ----------------------------------------------------------------------------------------------------
2022-02-05 08:07:26,967 epoch 7 - iter 793/7936 - loss 0.05520420 - samples/sec: 31.62 - lr: 0.000022
2022-02-05 08:13:58,782 epoch 7 - iter 1586/7936 - loss 0.05522964 - samples/sec: 32.39 - lr: 0.000021
2022-02-05 08:20:32,705 epoch 7 - iter 2379/7936 - loss 0.05482898 - samples/sec: 32.21 - lr: 0.000021
2022-02-05 08:27:14,353 epoch 7 - iter 3172/7936 - loss 0.05433105 - samples/sec: 31.59 - lr: 0.000020
2022-02-05 08:33:45,236 epoch 7 - iter 3965/7936 - loss 0.05397125 - samples/sec: 32.47 - lr: 0.000019
2022-02-05 08:40:14,072 epoch 7 - iter 4758/7936 - loss 0.05348281 - samples/sec: 32.64 - lr: 0.000019
2022-02-05 08:46:52,674 epoch 7 - iter 5551/7936 - loss 0.05316673 - samples/sec: 31.84 - lr: 0.000018
2022-02-05 08:53:20,653 epoch 7 - iter 6344/7936 - loss 0.05275831 - samples/sec: 32.71 - lr: 0.000018
2022-02-05 08:59:52,741 epoch 7 - iter 7137/7936 - loss 0.05230036 - samples/sec: 32.37 - lr: 0.000017
2022-02-05 09:06:38,983 epoch 7 - iter 7930/7936 - loss 0.05190552 - samples/sec: 31.24 - lr: 0.000017
2022-02-05 09:06:41,639 ----------------------------------------------------------------------------------------------------
2022-02-05 09:06:41,639 EPOCH 7 done: loss 0.0519 - lr 0.0000167
2022-02-05 09:09:20,864 DEV : loss 0.02467426098883152 - f1-score (micro avg) 0.9355
2022-02-05 09:09:20,924 BAD EPOCHS (no improvement): 4
2022-02-05 09:09:20,939 ----------------------------------------------------------------------------------------------------
2022-02-05 09:16:05,134 epoch 8 - iter 793/7936 - loss 0.04726178 - samples/sec: 31.40 - lr: 0.000016
2022-02-05 09:22:33,870 epoch 8 - iter 1586/7936 - loss 0.04719666 - samples/sec: 32.64 - lr: 0.000016
2022-02-05 09:29:02,929 epoch 8 - iter 2379/7936 - loss 0.04663752 - samples/sec: 32.62 - lr: 0.000015
2022-02-05 09:35:42,369 epoch 8 - iter 3172/7936 - loss 0.04634901 - samples/sec: 31.77 - lr: 0.000014
2022-02-05 09:42:14,843 epoch 8 - iter 3965/7936 - loss 0.04602895 - samples/sec: 32.33 - lr: 0.000014
2022-02-05 09:48:48,062 epoch 8 - iter 4758/7936 - loss 0.04582764 - samples/sec: 32.27 - lr: 0.000013
2022-02-05 09:55:28,863 epoch 8 - iter 5551/7936 - loss 0.04566599 - samples/sec: 31.66 - lr: 0.000013
2022-02-05 10:01:52,699 epoch 8 - iter 6344/7936 - loss 0.04545939 - samples/sec: 33.06 - lr: 0.000012
2022-02-05 10:08:33,137 epoch 8 - iter 7137/7936 - loss 0.04526206 - samples/sec: 31.69 - lr: 0.000012
2022-02-05 10:15:07,241 epoch 8 - iter 7930/7936 - loss 0.04503385 - samples/sec: 32.20 - lr: 0.000011
2022-02-05 10:15:10,600 ----------------------------------------------------------------------------------------------------
2022-02-05 10:15:10,600 EPOCH 8 done: loss 0.0450 - lr 0.0000111
2022-02-05 10:18:00,280 DEV : loss 0.02364770695567131 - f1-score (micro avg) 0.9371
2022-02-05 10:18:00,339 BAD EPOCHS (no improvement): 4
2022-02-05 10:18:00,358 ----------------------------------------------------------------------------------------------------
2022-02-05 10:24:31,011 epoch 9 - iter 793/7936 - loss 0.04122325 - samples/sec: 32.48 - lr: 0.000011
2022-02-05 10:31:00,279 epoch 9 - iter 1586/7936 - loss 0.04130931 - samples/sec: 32.60 - lr: 0.000010
2022-02-05 10:37:40,369 epoch 9 - iter 2379/7936 - loss 0.04131112 - samples/sec: 31.72 - lr: 0.000009
2022-02-05 10:44:11,067 epoch 9 - iter 3172/7936 - loss 0.04141124 - samples/sec: 32.48 - lr: 0.000009
2022-02-05 10:50:41,270 epoch 9 - iter 3965/7936 - loss 0.04120608 - samples/sec: 32.52 - lr: 0.000008
2022-02-05 10:57:24,718 epoch 9 - iter 4758/7936 - loss 0.04108655 - samples/sec: 31.45 - lr: 0.000008
2022-02-05 11:04:00,581 epoch 9 - iter 5551/7936 - loss 0.04093370 - samples/sec: 32.06 - lr: 0.000007
2022-02-05 11:10:31,042 epoch 9 - iter 6344/7936 - loss 0.04078404 - samples/sec: 32.50 - lr: 0.000007
2022-02-05 11:17:13,751 epoch 9 - iter 7137/7936 - loss 0.04061073 - samples/sec: 31.51 - lr: 0.000006
2022-02-05 11:23:44,231 epoch 9 - iter 7930/7936 - loss 0.04050638 - samples/sec: 32.50 - lr: 0.000006
2022-02-05 11:23:47,941 ----------------------------------------------------------------------------------------------------
2022-02-05 11:23:47,942 EPOCH 9 done: loss 0.0405 - lr 0.0000056
2022-02-05 11:26:37,114 DEV : loss 0.026182951405644417 - f1-score (micro avg) 0.9361
2022-02-05 11:26:37,173 BAD EPOCHS (no improvement): 4
2022-02-05 11:26:37,186 ----------------------------------------------------------------------------------------------------
2022-02-05 11:33:05,778 epoch 10 - iter 793/7936 - loss 0.03876526 - samples/sec: 32.66 - lr: 0.000005
2022-02-05 11:39:45,501 epoch 10 - iter 1586/7936 - loss 0.03871561 - samples/sec: 31.75 - lr: 0.000004
2022-02-05 11:46:18,242 epoch 10 - iter 2379/7936 - loss 0.03842790 - samples/sec: 32.31 - lr: 0.000004
2022-02-05 11:52:48,370 epoch 10 - iter 3172/7936 - loss 0.03820246 - samples/sec: 32.53 - lr: 0.000003
2022-02-05 11:59:28,420 epoch 10 - iter 3965/7936 - loss 0.03807900 - samples/sec: 31.72 - lr: 0.000003
2022-02-05 12:05:57,882 epoch 10 - iter 4758/7936 - loss 0.03798954 - samples/sec: 32.58 - lr: 0.000002
2022-02-05 12:12:25,766 epoch 10 - iter 5551/7936 - loss 0.03803371 - samples/sec: 32.72 - lr: 0.000002
2022-02-05 12:19:03,411 epoch 10 - iter 6344/7936 - loss 0.03805844 - samples/sec: 31.91 - lr: 0.000001
2022-02-05 12:25:27,539 epoch 10 - iter 7137/7936 - loss 0.03799490 - samples/sec: 33.04 - lr: 0.000001
2022-02-05 12:31:55,442 epoch 10 - iter 7930/7936 - loss 0.03798541 - samples/sec: 32.71 - lr: 0.000000
2022-02-05 12:31:58,461 ----------------------------------------------------------------------------------------------------
2022-02-05 12:31:58,462 EPOCH 10 done: loss 0.0380 - lr 0.0000000
2022-02-05 12:34:45,700 DEV : loss 0.027400659397244453 - f1-score (micro avg) 0.9368
2022-02-05 12:34:45,760 BAD EPOCHS (no improvement): 4
2022-02-05 12:34:46,755 ----------------------------------------------------------------------------------------------------
2022-02-05 12:34:46,757 Testing using last state of model ...
2022-02-05 12:37:34,421 0.9329 0.9323 0.9326 0.8893
2022-02-05 12:37:34,422
Results:
- F-score (micro) 0.9326
- F-score (macro) 0.9111
- Accuracy 0.8893
By class:
precision recall f1-score support
pers 0.9355 0.9279 0.9317 2734
loc 0.9242 0.9335 0.9288 1384
amount 0.9800 0.9800 0.9800 250
time 0.9456 0.9576 0.9516 236
func 0.9333 0.9000 0.9164 140
org 0.8148 0.8980 0.8544 49
prod 0.8621 0.9259 0.8929 27
event 0.8333 0.8333 0.8333 12
micro avg 0.9329 0.9323 0.9326 4832
macro avg 0.9036 0.9195 0.9111 4832
weighted avg 0.9331 0.9323 0.9327 4832
samples avg 0.8893 0.8893 0.8893 4832
2022-02-05 12:37:34,422 ----------------------------------------------------------------------------------------------------