stefan-it's picture
Upload folder using huggingface_hub
d2ce096
2023-10-15 02:31:46,751 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,752 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=21, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-15 02:31:46,752 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,752 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences
- NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator
2023-10-15 02:31:46,752 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,752 Train: 3575 sentences
2023-10-15 02:31:46,752 (train_with_dev=False, train_with_test=False)
2023-10-15 02:31:46,752 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,752 Training Params:
2023-10-15 02:31:46,752 - learning_rate: "0.00016"
2023-10-15 02:31:46,752 - mini_batch_size: "4"
2023-10-15 02:31:46,752 - max_epochs: "10"
2023-10-15 02:31:46,752 - shuffle: "True"
2023-10-15 02:31:46,753 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,753 Plugins:
2023-10-15 02:31:46,753 - TensorboardLogger
2023-10-15 02:31:46,753 - LinearScheduler | warmup_fraction: '0.1'
2023-10-15 02:31:46,753 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,753 Final evaluation on model from best epoch (best-model.pt)
2023-10-15 02:31:46,753 - metric: "('micro avg', 'f1-score')"
2023-10-15 02:31:46,753 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,753 Computation:
2023-10-15 02:31:46,753 - compute on device: cuda:0
2023-10-15 02:31:46,753 - embedding storage: none
2023-10-15 02:31:46,753 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,753 Model training base path: "hmbench-hipe2020/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4"
2023-10-15 02:31:46,753 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,753 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,753 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-15 02:32:03,162 epoch 1 - iter 89/894 - loss 3.02344183 - time (sec): 16.41 - samples/sec: 503.29 - lr: 0.000016 - momentum: 0.000000
2023-10-15 02:32:22,172 epoch 1 - iter 178/894 - loss 2.95643439 - time (sec): 35.42 - samples/sec: 517.95 - lr: 0.000032 - momentum: 0.000000
2023-10-15 02:32:39,335 epoch 1 - iter 267/894 - loss 2.78708734 - time (sec): 52.58 - samples/sec: 525.09 - lr: 0.000048 - momentum: 0.000000
2023-10-15 02:32:55,879 epoch 1 - iter 356/894 - loss 2.58160891 - time (sec): 69.13 - samples/sec: 521.14 - lr: 0.000064 - momentum: 0.000000
2023-10-15 02:33:12,285 epoch 1 - iter 445/894 - loss 2.35433260 - time (sec): 85.53 - samples/sec: 520.19 - lr: 0.000079 - momentum: 0.000000
2023-10-15 02:33:29,031 epoch 1 - iter 534/894 - loss 2.10818662 - time (sec): 102.28 - samples/sec: 520.02 - lr: 0.000095 - momentum: 0.000000
2023-10-15 02:33:45,298 epoch 1 - iter 623/894 - loss 1.90435758 - time (sec): 118.54 - samples/sec: 518.72 - lr: 0.000111 - momentum: 0.000000
2023-10-15 02:34:01,309 epoch 1 - iter 712/894 - loss 1.75522635 - time (sec): 134.55 - samples/sec: 514.33 - lr: 0.000127 - momentum: 0.000000
2023-10-15 02:34:17,945 epoch 1 - iter 801/894 - loss 1.61964852 - time (sec): 151.19 - samples/sec: 514.93 - lr: 0.000143 - momentum: 0.000000
2023-10-15 02:34:34,731 epoch 1 - iter 890/894 - loss 1.50331592 - time (sec): 167.98 - samples/sec: 513.77 - lr: 0.000159 - momentum: 0.000000
2023-10-15 02:34:35,368 ----------------------------------------------------------------------------------------------------
2023-10-15 02:34:35,368 EPOCH 1 done: loss 1.5004 - lr: 0.000159
2023-10-15 02:34:58,797 DEV : loss 0.3780643343925476 - f1-score (micro avg) 0.0
2023-10-15 02:34:58,823 ----------------------------------------------------------------------------------------------------
2023-10-15 02:35:15,413 epoch 2 - iter 89/894 - loss 0.35647784 - time (sec): 16.59 - samples/sec: 506.48 - lr: 0.000158 - momentum: 0.000000
2023-10-15 02:35:32,068 epoch 2 - iter 178/894 - loss 0.34495992 - time (sec): 33.24 - samples/sec: 488.93 - lr: 0.000156 - momentum: 0.000000
2023-10-15 02:35:48,763 epoch 2 - iter 267/894 - loss 0.32633621 - time (sec): 49.94 - samples/sec: 498.77 - lr: 0.000155 - momentum: 0.000000
2023-10-15 02:36:05,361 epoch 2 - iter 356/894 - loss 0.30999671 - time (sec): 66.54 - samples/sec: 501.31 - lr: 0.000153 - momentum: 0.000000
2023-10-15 02:36:22,230 epoch 2 - iter 445/894 - loss 0.29402029 - time (sec): 83.41 - samples/sec: 504.29 - lr: 0.000151 - momentum: 0.000000
2023-10-15 02:36:38,633 epoch 2 - iter 534/894 - loss 0.29260383 - time (sec): 99.81 - samples/sec: 504.68 - lr: 0.000149 - momentum: 0.000000
2023-10-15 02:36:55,012 epoch 2 - iter 623/894 - loss 0.28062458 - time (sec): 116.19 - samples/sec: 505.86 - lr: 0.000148 - momentum: 0.000000
2023-10-15 02:37:11,215 epoch 2 - iter 712/894 - loss 0.27449208 - time (sec): 132.39 - samples/sec: 505.71 - lr: 0.000146 - momentum: 0.000000
2023-10-15 02:37:28,432 epoch 2 - iter 801/894 - loss 0.26359435 - time (sec): 149.61 - samples/sec: 509.57 - lr: 0.000144 - momentum: 0.000000
2023-10-15 02:37:47,059 epoch 2 - iter 890/894 - loss 0.25229683 - time (sec): 168.23 - samples/sec: 511.90 - lr: 0.000142 - momentum: 0.000000
2023-10-15 02:37:47,825 ----------------------------------------------------------------------------------------------------
2023-10-15 02:37:47,825 EPOCH 2 done: loss 0.2512 - lr: 0.000142
2023-10-15 02:38:12,900 DEV : loss 0.19516690075397491 - f1-score (micro avg) 0.6578
2023-10-15 02:38:12,926 saving best model
2023-10-15 02:38:13,600 ----------------------------------------------------------------------------------------------------
2023-10-15 02:38:30,620 epoch 3 - iter 89/894 - loss 0.19012779 - time (sec): 17.02 - samples/sec: 538.74 - lr: 0.000140 - momentum: 0.000000
2023-10-15 02:38:47,328 epoch 3 - iter 178/894 - loss 0.15803457 - time (sec): 33.73 - samples/sec: 540.12 - lr: 0.000139 - momentum: 0.000000
2023-10-15 02:39:04,061 epoch 3 - iter 267/894 - loss 0.14197851 - time (sec): 50.46 - samples/sec: 531.87 - lr: 0.000137 - momentum: 0.000000
2023-10-15 02:39:20,354 epoch 3 - iter 356/894 - loss 0.14302842 - time (sec): 66.75 - samples/sec: 525.00 - lr: 0.000135 - momentum: 0.000000
2023-10-15 02:39:36,895 epoch 3 - iter 445/894 - loss 0.13867980 - time (sec): 83.29 - samples/sec: 524.43 - lr: 0.000133 - momentum: 0.000000
2023-10-15 02:39:52,835 epoch 3 - iter 534/894 - loss 0.13292117 - time (sec): 99.23 - samples/sec: 516.98 - lr: 0.000132 - momentum: 0.000000
2023-10-15 02:40:09,133 epoch 3 - iter 623/894 - loss 0.13020535 - time (sec): 115.53 - samples/sec: 513.60 - lr: 0.000130 - momentum: 0.000000
2023-10-15 02:40:25,404 epoch 3 - iter 712/894 - loss 0.12587355 - time (sec): 131.80 - samples/sec: 512.31 - lr: 0.000128 - momentum: 0.000000
2023-10-15 02:40:43,932 epoch 3 - iter 801/894 - loss 0.12594823 - time (sec): 150.33 - samples/sec: 513.37 - lr: 0.000126 - momentum: 0.000000
2023-10-15 02:41:01,043 epoch 3 - iter 890/894 - loss 0.12151089 - time (sec): 167.44 - samples/sec: 514.04 - lr: 0.000125 - momentum: 0.000000
2023-10-15 02:41:01,800 ----------------------------------------------------------------------------------------------------
2023-10-15 02:41:01,800 EPOCH 3 done: loss 0.1218 - lr: 0.000125
2023-10-15 02:41:27,066 DEV : loss 0.15591633319854736 - f1-score (micro avg) 0.7298
2023-10-15 02:41:27,092 saving best model
2023-10-15 02:41:27,983 ----------------------------------------------------------------------------------------------------
2023-10-15 02:41:44,324 epoch 4 - iter 89/894 - loss 0.07103724 - time (sec): 16.34 - samples/sec: 506.27 - lr: 0.000123 - momentum: 0.000000
2023-10-15 02:42:01,963 epoch 4 - iter 178/894 - loss 0.06257623 - time (sec): 33.98 - samples/sec: 517.81 - lr: 0.000121 - momentum: 0.000000
2023-10-15 02:42:18,072 epoch 4 - iter 267/894 - loss 0.06910373 - time (sec): 50.09 - samples/sec: 511.23 - lr: 0.000119 - momentum: 0.000000
2023-10-15 02:42:34,820 epoch 4 - iter 356/894 - loss 0.07107525 - time (sec): 66.83 - samples/sec: 520.12 - lr: 0.000117 - momentum: 0.000000
2023-10-15 02:42:51,325 epoch 4 - iter 445/894 - loss 0.06802427 - time (sec): 83.34 - samples/sec: 522.37 - lr: 0.000116 - momentum: 0.000000
2023-10-15 02:43:07,291 epoch 4 - iter 534/894 - loss 0.06713645 - time (sec): 99.30 - samples/sec: 519.31 - lr: 0.000114 - momentum: 0.000000
2023-10-15 02:43:23,747 epoch 4 - iter 623/894 - loss 0.06876201 - time (sec): 115.76 - samples/sec: 521.62 - lr: 0.000112 - momentum: 0.000000
2023-10-15 02:43:41,754 epoch 4 - iter 712/894 - loss 0.07019890 - time (sec): 133.77 - samples/sec: 519.61 - lr: 0.000110 - momentum: 0.000000
2023-10-15 02:43:57,924 epoch 4 - iter 801/894 - loss 0.06984123 - time (sec): 149.94 - samples/sec: 518.42 - lr: 0.000109 - momentum: 0.000000
2023-10-15 02:44:14,341 epoch 4 - iter 890/894 - loss 0.06910896 - time (sec): 166.35 - samples/sec: 518.79 - lr: 0.000107 - momentum: 0.000000
2023-10-15 02:44:14,960 ----------------------------------------------------------------------------------------------------
2023-10-15 02:44:14,961 EPOCH 4 done: loss 0.0690 - lr: 0.000107
2023-10-15 02:44:39,949 DEV : loss 0.1609293520450592 - f1-score (micro avg) 0.7516
2023-10-15 02:44:39,975 saving best model
2023-10-15 02:44:40,880 ----------------------------------------------------------------------------------------------------
2023-10-15 02:44:57,848 epoch 5 - iter 89/894 - loss 0.03775608 - time (sec): 16.97 - samples/sec: 531.50 - lr: 0.000105 - momentum: 0.000000
2023-10-15 02:45:14,380 epoch 5 - iter 178/894 - loss 0.03532971 - time (sec): 33.50 - samples/sec: 524.28 - lr: 0.000103 - momentum: 0.000000
2023-10-15 02:45:31,264 epoch 5 - iter 267/894 - loss 0.03513730 - time (sec): 50.38 - samples/sec: 530.22 - lr: 0.000101 - momentum: 0.000000
2023-10-15 02:45:47,982 epoch 5 - iter 356/894 - loss 0.03993442 - time (sec): 67.10 - samples/sec: 528.81 - lr: 0.000100 - momentum: 0.000000
2023-10-15 02:46:06,528 epoch 5 - iter 445/894 - loss 0.04620953 - time (sec): 85.65 - samples/sec: 529.50 - lr: 0.000098 - momentum: 0.000000
2023-10-15 02:46:22,864 epoch 5 - iter 534/894 - loss 0.04713336 - time (sec): 101.98 - samples/sec: 527.63 - lr: 0.000096 - momentum: 0.000000
2023-10-15 02:46:38,857 epoch 5 - iter 623/894 - loss 0.04525939 - time (sec): 117.97 - samples/sec: 521.37 - lr: 0.000094 - momentum: 0.000000
2023-10-15 02:46:55,181 epoch 5 - iter 712/894 - loss 0.04267360 - time (sec): 134.30 - samples/sec: 519.69 - lr: 0.000093 - momentum: 0.000000
2023-10-15 02:47:11,200 epoch 5 - iter 801/894 - loss 0.04096140 - time (sec): 150.32 - samples/sec: 516.27 - lr: 0.000091 - momentum: 0.000000
2023-10-15 02:47:27,819 epoch 5 - iter 890/894 - loss 0.04206314 - time (sec): 166.94 - samples/sec: 516.61 - lr: 0.000089 - momentum: 0.000000
2023-10-15 02:47:28,490 ----------------------------------------------------------------------------------------------------
2023-10-15 02:47:28,490 EPOCH 5 done: loss 0.0419 - lr: 0.000089
2023-10-15 02:47:53,602 DEV : loss 0.1873023360967636 - f1-score (micro avg) 0.7704
2023-10-15 02:47:53,628 saving best model
2023-10-15 02:47:54,607 ----------------------------------------------------------------------------------------------------
2023-10-15 02:48:12,579 epoch 6 - iter 89/894 - loss 0.04090361 - time (sec): 17.97 - samples/sec: 505.20 - lr: 0.000087 - momentum: 0.000000
2023-10-15 02:48:30,268 epoch 6 - iter 178/894 - loss 0.03139198 - time (sec): 35.66 - samples/sec: 515.84 - lr: 0.000085 - momentum: 0.000000
2023-10-15 02:48:46,856 epoch 6 - iter 267/894 - loss 0.03064485 - time (sec): 52.25 - samples/sec: 522.23 - lr: 0.000084 - momentum: 0.000000
2023-10-15 02:49:03,120 epoch 6 - iter 356/894 - loss 0.02852187 - time (sec): 68.51 - samples/sec: 516.68 - lr: 0.000082 - momentum: 0.000000
2023-10-15 02:49:20,293 epoch 6 - iter 445/894 - loss 0.02788748 - time (sec): 85.68 - samples/sec: 508.91 - lr: 0.000080 - momentum: 0.000000
2023-10-15 02:49:37,772 epoch 6 - iter 534/894 - loss 0.02794351 - time (sec): 103.16 - samples/sec: 506.10 - lr: 0.000078 - momentum: 0.000000
2023-10-15 02:49:55,339 epoch 6 - iter 623/894 - loss 0.02802367 - time (sec): 120.73 - samples/sec: 502.99 - lr: 0.000077 - momentum: 0.000000
2023-10-15 02:50:12,915 epoch 6 - iter 712/894 - loss 0.02792162 - time (sec): 138.31 - samples/sec: 500.00 - lr: 0.000075 - momentum: 0.000000
2023-10-15 02:50:29,545 epoch 6 - iter 801/894 - loss 0.02919230 - time (sec): 154.94 - samples/sec: 499.81 - lr: 0.000073 - momentum: 0.000000
2023-10-15 02:50:46,019 epoch 6 - iter 890/894 - loss 0.02781489 - time (sec): 171.41 - samples/sec: 503.33 - lr: 0.000071 - momentum: 0.000000
2023-10-15 02:50:46,685 ----------------------------------------------------------------------------------------------------
2023-10-15 02:50:46,685 EPOCH 6 done: loss 0.0278 - lr: 0.000071
2023-10-15 02:51:13,116 DEV : loss 0.21434210240840912 - f1-score (micro avg) 0.7671
2023-10-15 02:51:13,144 ----------------------------------------------------------------------------------------------------
2023-10-15 02:51:32,358 epoch 7 - iter 89/894 - loss 0.03055746 - time (sec): 19.21 - samples/sec: 506.79 - lr: 0.000069 - momentum: 0.000000
2023-10-15 02:51:49,459 epoch 7 - iter 178/894 - loss 0.03178652 - time (sec): 36.31 - samples/sec: 508.13 - lr: 0.000068 - momentum: 0.000000
2023-10-15 02:52:05,517 epoch 7 - iter 267/894 - loss 0.02947788 - time (sec): 52.37 - samples/sec: 496.56 - lr: 0.000066 - momentum: 0.000000
2023-10-15 02:52:21,830 epoch 7 - iter 356/894 - loss 0.02621194 - time (sec): 68.68 - samples/sec: 497.65 - lr: 0.000064 - momentum: 0.000000
2023-10-15 02:52:38,189 epoch 7 - iter 445/894 - loss 0.02435064 - time (sec): 85.04 - samples/sec: 498.38 - lr: 0.000062 - momentum: 0.000000
2023-10-15 02:52:54,896 epoch 7 - iter 534/894 - loss 0.02215860 - time (sec): 101.75 - samples/sec: 503.67 - lr: 0.000061 - momentum: 0.000000
2023-10-15 02:53:11,073 epoch 7 - iter 623/894 - loss 0.02099170 - time (sec): 117.93 - samples/sec: 501.57 - lr: 0.000059 - momentum: 0.000000
2023-10-15 02:53:27,755 epoch 7 - iter 712/894 - loss 0.01995778 - time (sec): 134.61 - samples/sec: 504.04 - lr: 0.000057 - momentum: 0.000000
2023-10-15 02:53:44,630 epoch 7 - iter 801/894 - loss 0.01902734 - time (sec): 151.48 - samples/sec: 507.27 - lr: 0.000055 - momentum: 0.000000
2023-10-15 02:54:02,012 epoch 7 - iter 890/894 - loss 0.01833316 - time (sec): 168.87 - samples/sec: 510.97 - lr: 0.000053 - momentum: 0.000000
2023-10-15 02:54:02,650 ----------------------------------------------------------------------------------------------------
2023-10-15 02:54:02,651 EPOCH 7 done: loss 0.0185 - lr: 0.000053
2023-10-15 02:54:28,844 DEV : loss 0.21627573668956757 - f1-score (micro avg) 0.7695
2023-10-15 02:54:28,870 ----------------------------------------------------------------------------------------------------
2023-10-15 02:54:45,753 epoch 8 - iter 89/894 - loss 0.01084712 - time (sec): 16.88 - samples/sec: 505.28 - lr: 0.000052 - momentum: 0.000000
2023-10-15 02:55:03,142 epoch 8 - iter 178/894 - loss 0.01079641 - time (sec): 34.27 - samples/sec: 513.12 - lr: 0.000050 - momentum: 0.000000
2023-10-15 02:55:20,027 epoch 8 - iter 267/894 - loss 0.00894694 - time (sec): 51.16 - samples/sec: 521.76 - lr: 0.000048 - momentum: 0.000000
2023-10-15 02:55:36,682 epoch 8 - iter 356/894 - loss 0.00974459 - time (sec): 67.81 - samples/sec: 522.44 - lr: 0.000046 - momentum: 0.000000
2023-10-15 02:55:53,263 epoch 8 - iter 445/894 - loss 0.01110363 - time (sec): 84.39 - samples/sec: 522.29 - lr: 0.000045 - momentum: 0.000000
2023-10-15 02:56:09,389 epoch 8 - iter 534/894 - loss 0.01181829 - time (sec): 100.52 - samples/sec: 517.26 - lr: 0.000043 - momentum: 0.000000
2023-10-15 02:56:26,017 epoch 8 - iter 623/894 - loss 0.01176535 - time (sec): 117.15 - samples/sec: 516.61 - lr: 0.000041 - momentum: 0.000000
2023-10-15 02:56:42,522 epoch 8 - iter 712/894 - loss 0.01085512 - time (sec): 133.65 - samples/sec: 516.06 - lr: 0.000039 - momentum: 0.000000
2023-10-15 02:57:01,040 epoch 8 - iter 801/894 - loss 0.01135469 - time (sec): 152.17 - samples/sec: 515.39 - lr: 0.000038 - momentum: 0.000000
2023-10-15 02:57:17,154 epoch 8 - iter 890/894 - loss 0.01148836 - time (sec): 168.28 - samples/sec: 512.32 - lr: 0.000036 - momentum: 0.000000
2023-10-15 02:57:17,838 ----------------------------------------------------------------------------------------------------
2023-10-15 02:57:17,839 EPOCH 8 done: loss 0.0118 - lr: 0.000036
2023-10-15 02:57:44,212 DEV : loss 0.22242531180381775 - f1-score (micro avg) 0.7719
2023-10-15 02:57:44,240 saving best model
2023-10-15 02:57:46,752 ----------------------------------------------------------------------------------------------------
2023-10-15 02:58:02,946 epoch 9 - iter 89/894 - loss 0.00595299 - time (sec): 16.19 - samples/sec: 494.63 - lr: 0.000034 - momentum: 0.000000
2023-10-15 02:58:19,361 epoch 9 - iter 178/894 - loss 0.00644078 - time (sec): 32.61 - samples/sec: 498.46 - lr: 0.000032 - momentum: 0.000000
2023-10-15 02:58:35,843 epoch 9 - iter 267/894 - loss 0.01132599 - time (sec): 49.09 - samples/sec: 505.33 - lr: 0.000030 - momentum: 0.000000
2023-10-15 02:58:52,419 epoch 9 - iter 356/894 - loss 0.00904748 - time (sec): 65.67 - samples/sec: 510.07 - lr: 0.000029 - momentum: 0.000000
2023-10-15 02:59:10,359 epoch 9 - iter 445/894 - loss 0.01098782 - time (sec): 83.60 - samples/sec: 506.84 - lr: 0.000027 - momentum: 0.000000
2023-10-15 02:59:27,525 epoch 9 - iter 534/894 - loss 0.01062907 - time (sec): 100.77 - samples/sec: 512.80 - lr: 0.000025 - momentum: 0.000000
2023-10-15 02:59:44,394 epoch 9 - iter 623/894 - loss 0.01018680 - time (sec): 117.64 - samples/sec: 512.48 - lr: 0.000023 - momentum: 0.000000
2023-10-15 03:00:01,280 epoch 9 - iter 712/894 - loss 0.00968917 - time (sec): 134.53 - samples/sec: 513.69 - lr: 0.000022 - momentum: 0.000000
2023-10-15 03:00:17,808 epoch 9 - iter 801/894 - loss 0.00910352 - time (sec): 151.05 - samples/sec: 515.92 - lr: 0.000020 - momentum: 0.000000
2023-10-15 03:00:34,257 epoch 9 - iter 890/894 - loss 0.00910813 - time (sec): 167.50 - samples/sec: 514.73 - lr: 0.000018 - momentum: 0.000000
2023-10-15 03:00:34,964 ----------------------------------------------------------------------------------------------------
2023-10-15 03:00:34,964 EPOCH 9 done: loss 0.0091 - lr: 0.000018
2023-10-15 03:01:01,017 DEV : loss 0.23208962380886078 - f1-score (micro avg) 0.7894
2023-10-15 03:01:01,043 saving best model
2023-10-15 03:01:03,945 ----------------------------------------------------------------------------------------------------
2023-10-15 03:01:22,920 epoch 10 - iter 89/894 - loss 0.00741155 - time (sec): 18.97 - samples/sec: 530.99 - lr: 0.000016 - momentum: 0.000000
2023-10-15 03:01:39,312 epoch 10 - iter 178/894 - loss 0.00663561 - time (sec): 35.36 - samples/sec: 518.24 - lr: 0.000014 - momentum: 0.000000
2023-10-15 03:01:55,571 epoch 10 - iter 267/894 - loss 0.00512243 - time (sec): 51.62 - samples/sec: 515.38 - lr: 0.000013 - momentum: 0.000000
2023-10-15 03:02:11,926 epoch 10 - iter 356/894 - loss 0.00494199 - time (sec): 67.98 - samples/sec: 513.59 - lr: 0.000011 - momentum: 0.000000
2023-10-15 03:02:29,205 epoch 10 - iter 445/894 - loss 0.00519207 - time (sec): 85.26 - samples/sec: 519.56 - lr: 0.000009 - momentum: 0.000000
2023-10-15 03:02:45,598 epoch 10 - iter 534/894 - loss 0.00528129 - time (sec): 101.65 - samples/sec: 517.28 - lr: 0.000007 - momentum: 0.000000
2023-10-15 03:03:02,594 epoch 10 - iter 623/894 - loss 0.00652465 - time (sec): 118.65 - samples/sec: 515.94 - lr: 0.000006 - momentum: 0.000000
2023-10-15 03:03:18,989 epoch 10 - iter 712/894 - loss 0.00616289 - time (sec): 135.04 - samples/sec: 517.96 - lr: 0.000004 - momentum: 0.000000
2023-10-15 03:03:35,161 epoch 10 - iter 801/894 - loss 0.00585123 - time (sec): 151.21 - samples/sec: 516.29 - lr: 0.000002 - momentum: 0.000000
2023-10-15 03:03:51,457 epoch 10 - iter 890/894 - loss 0.00587903 - time (sec): 167.51 - samples/sec: 515.29 - lr: 0.000000 - momentum: 0.000000
2023-10-15 03:03:52,093 ----------------------------------------------------------------------------------------------------
2023-10-15 03:03:52,093 EPOCH 10 done: loss 0.0059 - lr: 0.000000
2023-10-15 03:04:17,600 DEV : loss 0.2378893792629242 - f1-score (micro avg) 0.7848
2023-10-15 03:04:18,259 ----------------------------------------------------------------------------------------------------
2023-10-15 03:04:18,261 Loading model from best epoch ...
2023-10-15 03:04:26,176 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time
2023-10-15 03:04:49,929
Results:
- F-score (micro) 0.7741
- F-score (macro) 0.6991
- Accuracy 0.6448
By class:
precision recall f1-score support
loc 0.8588 0.8674 0.8631 596
pers 0.6898 0.7748 0.7298 333
org 0.5923 0.5833 0.5878 132
prod 0.6731 0.5303 0.5932 66
time 0.7292 0.7143 0.7216 49
micro avg 0.7645 0.7840 0.7741 1176
macro avg 0.7086 0.6940 0.6991 1176
weighted avg 0.7652 0.7840 0.7734 1176
2023-10-15 03:04:49,929 ----------------------------------------------------------------------------------------------------