2023-10-25 21:08:47,420 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:08:47,421 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(64001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-25 21:08:47,421 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:08:47,421 MultiCorpus: 1085 train + 148 dev + 364 test sentences - NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator 2023-10-25 21:08:47,421 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:08:47,421 Train: 1085 sentences 2023-10-25 21:08:47,422 (train_with_dev=False, train_with_test=False) 2023-10-25 21:08:47,422 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:08:47,422 Training Params: 2023-10-25 21:08:47,422 - learning_rate: "3e-05" 2023-10-25 21:08:47,422 - mini_batch_size: "8" 2023-10-25 21:08:47,422 - max_epochs: "10" 2023-10-25 21:08:47,422 - shuffle: "True" 2023-10-25 21:08:47,422 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:08:47,422 Plugins: 2023-10-25 21:08:47,422 - TensorboardLogger 2023-10-25 21:08:47,422 - LinearScheduler | warmup_fraction: '0.1' 2023-10-25 21:08:47,422 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:08:47,422 Final evaluation on model from best epoch (best-model.pt) 2023-10-25 21:08:47,422 - metric: "('micro avg', 'f1-score')" 2023-10-25 21:08:47,422 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:08:47,422 Computation: 2023-10-25 21:08:47,422 - compute on device: cuda:0 2023-10-25 21:08:47,422 - embedding storage: none 2023-10-25 21:08:47,422 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:08:47,422 Model training base path: "hmbench-newseye/sv-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-3" 2023-10-25 21:08:47,422 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:08:47,422 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:08:47,422 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-25 21:08:48,643 epoch 1 - iter 13/136 - loss 2.92666559 - time (sec): 1.22 - samples/sec: 4914.04 - lr: 0.000003 - momentum: 0.000000 2023-10-25 21:08:49,578 epoch 1 - iter 26/136 - loss 2.61542995 - time (sec): 2.15 - samples/sec: 5007.41 - lr: 0.000006 - momentum: 0.000000 2023-10-25 21:08:50,568 epoch 1 - iter 39/136 - loss 2.21106256 - time (sec): 3.14 - samples/sec: 4785.25 - lr: 0.000008 - momentum: 0.000000 2023-10-25 21:08:51,544 epoch 1 - iter 52/136 - loss 1.84078109 - time (sec): 4.12 - samples/sec: 4771.60 - lr: 0.000011 - momentum: 0.000000 2023-10-25 21:08:52,625 epoch 1 - iter 65/136 - loss 1.58946069 - time (sec): 5.20 - samples/sec: 4719.53 - lr: 0.000014 - momentum: 0.000000 2023-10-25 21:08:53,772 epoch 1 - iter 78/136 - loss 1.38636035 - time (sec): 6.35 - samples/sec: 4665.34 - lr: 0.000017 - momentum: 0.000000 2023-10-25 21:08:54,771 epoch 1 - iter 91/136 - loss 1.23315047 - time (sec): 7.35 - samples/sec: 4740.34 - lr: 0.000020 - momentum: 0.000000 2023-10-25 21:08:55,708 epoch 1 - iter 104/136 - loss 1.11372936 - time (sec): 8.28 - samples/sec: 4834.46 - lr: 0.000023 - momentum: 0.000000 2023-10-25 21:08:56,692 epoch 1 - iter 117/136 - loss 1.02352100 - time (sec): 9.27 - samples/sec: 4837.71 - lr: 0.000026 - momentum: 0.000000 2023-10-25 21:08:57,649 epoch 1 - iter 130/136 - loss 0.94638674 - time (sec): 10.23 - samples/sec: 4854.97 - lr: 0.000028 - momentum: 0.000000 2023-10-25 21:08:58,158 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:08:58,159 EPOCH 1 done: loss 0.9140 - lr: 0.000028 2023-10-25 21:08:59,167 DEV : loss 0.17706048488616943 - f1-score (micro avg) 0.5705 2023-10-25 21:08:59,192 saving best model 2023-10-25 21:08:59,700 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:09:00,753 epoch 2 - iter 13/136 - loss 0.18751231 - time (sec): 1.05 - samples/sec: 5144.60 - lr: 0.000030 - momentum: 0.000000 2023-10-25 21:09:01,666 epoch 2 - iter 26/136 - loss 0.16800902 - time (sec): 1.96 - samples/sec: 4981.93 - lr: 0.000029 - momentum: 0.000000 2023-10-25 21:09:02,626 epoch 2 - iter 39/136 - loss 0.18325524 - time (sec): 2.92 - samples/sec: 4890.39 - lr: 0.000029 - momentum: 0.000000 2023-10-25 21:09:03,729 epoch 2 - iter 52/136 - loss 0.17724190 - time (sec): 4.03 - samples/sec: 4942.79 - lr: 0.000029 - momentum: 0.000000 2023-10-25 21:09:04,766 epoch 2 - iter 65/136 - loss 0.17625816 - time (sec): 5.06 - samples/sec: 4934.27 - lr: 0.000028 - momentum: 0.000000 2023-10-25 21:09:05,926 epoch 2 - iter 78/136 - loss 0.16835613 - time (sec): 6.22 - samples/sec: 4867.14 - lr: 0.000028 - momentum: 0.000000 2023-10-25 21:09:06,900 epoch 2 - iter 91/136 - loss 0.16168936 - time (sec): 7.20 - samples/sec: 4906.92 - lr: 0.000028 - momentum: 0.000000 2023-10-25 21:09:07,973 epoch 2 - iter 104/136 - loss 0.15601564 - time (sec): 8.27 - samples/sec: 4941.16 - lr: 0.000027 - momentum: 0.000000 2023-10-25 21:09:08,950 epoch 2 - iter 117/136 - loss 0.15334431 - time (sec): 9.25 - samples/sec: 4972.13 - lr: 0.000027 - momentum: 0.000000 2023-10-25 21:09:09,858 epoch 2 - iter 130/136 - loss 0.15087859 - time (sec): 10.16 - samples/sec: 4896.45 - lr: 0.000027 - momentum: 0.000000 2023-10-25 21:09:10,316 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:09:10,316 EPOCH 2 done: loss 0.1473 - lr: 0.000027 2023-10-25 21:09:11,511 DEV : loss 0.11033203452825546 - f1-score (micro avg) 0.7569 2023-10-25 21:09:11,518 saving best model 2023-10-25 21:09:12,238 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:09:13,260 epoch 3 - iter 13/136 - loss 0.06875991 - time (sec): 1.02 - samples/sec: 5015.96 - lr: 0.000026 - momentum: 0.000000 2023-10-25 21:09:14,556 epoch 3 - iter 26/136 - loss 0.07190632 - time (sec): 2.32 - samples/sec: 4099.17 - lr: 0.000026 - momentum: 0.000000 2023-10-25 21:09:15,524 epoch 3 - iter 39/136 - loss 0.07481653 - time (sec): 3.28 - samples/sec: 4358.62 - lr: 0.000026 - momentum: 0.000000 2023-10-25 21:09:16,404 epoch 3 - iter 52/136 - loss 0.08443368 - time (sec): 4.16 - samples/sec: 4414.42 - lr: 0.000025 - momentum: 0.000000 2023-10-25 21:09:17,438 epoch 3 - iter 65/136 - loss 0.08616651 - time (sec): 5.20 - samples/sec: 4545.20 - lr: 0.000025 - momentum: 0.000000 2023-10-25 21:09:18,441 epoch 3 - iter 78/136 - loss 0.08917736 - time (sec): 6.20 - samples/sec: 4634.32 - lr: 0.000025 - momentum: 0.000000 2023-10-25 21:09:19,473 epoch 3 - iter 91/136 - loss 0.08540801 - time (sec): 7.23 - samples/sec: 4719.75 - lr: 0.000024 - momentum: 0.000000 2023-10-25 21:09:20,523 epoch 3 - iter 104/136 - loss 0.08185322 - time (sec): 8.28 - samples/sec: 4745.35 - lr: 0.000024 - momentum: 0.000000 2023-10-25 21:09:21,534 epoch 3 - iter 117/136 - loss 0.08468495 - time (sec): 9.29 - samples/sec: 4841.87 - lr: 0.000024 - momentum: 0.000000 2023-10-25 21:09:22,667 epoch 3 - iter 130/136 - loss 0.08395335 - time (sec): 10.43 - samples/sec: 4789.86 - lr: 0.000024 - momentum: 0.000000 2023-10-25 21:09:23,089 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:09:23,090 EPOCH 3 done: loss 0.0829 - lr: 0.000024 2023-10-25 21:09:24,242 DEV : loss 0.09931305795907974 - f1-score (micro avg) 0.789 2023-10-25 21:09:24,249 saving best model 2023-10-25 21:09:24,943 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:09:25,916 epoch 4 - iter 13/136 - loss 0.05349351 - time (sec): 0.97 - samples/sec: 5163.61 - lr: 0.000023 - momentum: 0.000000 2023-10-25 21:09:26,897 epoch 4 - iter 26/136 - loss 0.06098037 - time (sec): 1.95 - samples/sec: 5008.23 - lr: 0.000023 - momentum: 0.000000 2023-10-25 21:09:27,968 epoch 4 - iter 39/136 - loss 0.05921489 - time (sec): 3.02 - samples/sec: 4988.59 - lr: 0.000022 - momentum: 0.000000 2023-10-25 21:09:28,941 epoch 4 - iter 52/136 - loss 0.05684626 - time (sec): 4.00 - samples/sec: 4963.35 - lr: 0.000022 - momentum: 0.000000 2023-10-25 21:09:29,924 epoch 4 - iter 65/136 - loss 0.05503768 - time (sec): 4.98 - samples/sec: 4935.30 - lr: 0.000022 - momentum: 0.000000 2023-10-25 21:09:30,854 epoch 4 - iter 78/136 - loss 0.05465638 - time (sec): 5.91 - samples/sec: 4961.39 - lr: 0.000021 - momentum: 0.000000 2023-10-25 21:09:31,887 epoch 4 - iter 91/136 - loss 0.05219145 - time (sec): 6.94 - samples/sec: 5014.03 - lr: 0.000021 - momentum: 0.000000 2023-10-25 21:09:32,825 epoch 4 - iter 104/136 - loss 0.05252782 - time (sec): 7.88 - samples/sec: 5000.16 - lr: 0.000021 - momentum: 0.000000 2023-10-25 21:09:33,875 epoch 4 - iter 117/136 - loss 0.05015941 - time (sec): 8.93 - samples/sec: 5042.05 - lr: 0.000021 - momentum: 0.000000 2023-10-25 21:09:34,846 epoch 4 - iter 130/136 - loss 0.04876858 - time (sec): 9.90 - samples/sec: 4991.05 - lr: 0.000020 - momentum: 0.000000 2023-10-25 21:09:35,366 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:09:35,366 EPOCH 4 done: loss 0.0499 - lr: 0.000020 2023-10-25 21:09:36,532 DEV : loss 0.10973858088254929 - f1-score (micro avg) 0.7941 2023-10-25 21:09:36,541 saving best model 2023-10-25 21:09:37,383 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:09:39,080 epoch 5 - iter 13/136 - loss 0.03025322 - time (sec): 1.69 - samples/sec: 3026.98 - lr: 0.000020 - momentum: 0.000000 2023-10-25 21:09:40,028 epoch 5 - iter 26/136 - loss 0.03412351 - time (sec): 2.64 - samples/sec: 3626.59 - lr: 0.000019 - momentum: 0.000000 2023-10-25 21:09:40,987 epoch 5 - iter 39/136 - loss 0.03068977 - time (sec): 3.60 - samples/sec: 4043.68 - lr: 0.000019 - momentum: 0.000000 2023-10-25 21:09:41,968 epoch 5 - iter 52/136 - loss 0.03638708 - time (sec): 4.58 - samples/sec: 4260.56 - lr: 0.000019 - momentum: 0.000000 2023-10-25 21:09:42,947 epoch 5 - iter 65/136 - loss 0.03194803 - time (sec): 5.56 - samples/sec: 4354.05 - lr: 0.000018 - momentum: 0.000000 2023-10-25 21:09:44,106 epoch 5 - iter 78/136 - loss 0.03026381 - time (sec): 6.72 - samples/sec: 4436.80 - lr: 0.000018 - momentum: 0.000000 2023-10-25 21:09:45,014 epoch 5 - iter 91/136 - loss 0.02997471 - time (sec): 7.63 - samples/sec: 4410.64 - lr: 0.000018 - momentum: 0.000000 2023-10-25 21:09:46,035 epoch 5 - iter 104/136 - loss 0.03032205 - time (sec): 8.65 - samples/sec: 4468.93 - lr: 0.000018 - momentum: 0.000000 2023-10-25 21:09:47,060 epoch 5 - iter 117/136 - loss 0.03204875 - time (sec): 9.67 - samples/sec: 4601.32 - lr: 0.000017 - momentum: 0.000000 2023-10-25 21:09:48,164 epoch 5 - iter 130/136 - loss 0.03091804 - time (sec): 10.78 - samples/sec: 4690.60 - lr: 0.000017 - momentum: 0.000000 2023-10-25 21:09:48,549 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:09:48,549 EPOCH 5 done: loss 0.0307 - lr: 0.000017 2023-10-25 21:09:49,824 DEV : loss 0.11609525978565216 - f1-score (micro avg) 0.8 2023-10-25 21:09:49,831 saving best model 2023-10-25 21:09:50,591 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:09:51,709 epoch 6 - iter 13/136 - loss 0.02566814 - time (sec): 1.11 - samples/sec: 5423.13 - lr: 0.000016 - momentum: 0.000000 2023-10-25 21:09:52,715 epoch 6 - iter 26/136 - loss 0.02241577 - time (sec): 2.12 - samples/sec: 5308.38 - lr: 0.000016 - momentum: 0.000000 2023-10-25 21:09:53,619 epoch 6 - iter 39/136 - loss 0.02325505 - time (sec): 3.02 - samples/sec: 5079.48 - lr: 0.000016 - momentum: 0.000000 2023-10-25 21:09:54,589 epoch 6 - iter 52/136 - loss 0.02348731 - time (sec): 3.99 - samples/sec: 5020.51 - lr: 0.000015 - momentum: 0.000000 2023-10-25 21:09:55,511 epoch 6 - iter 65/136 - loss 0.02186652 - time (sec): 4.92 - samples/sec: 5038.20 - lr: 0.000015 - momentum: 0.000000 2023-10-25 21:09:56,452 epoch 6 - iter 78/136 - loss 0.02325046 - time (sec): 5.86 - samples/sec: 5018.77 - lr: 0.000015 - momentum: 0.000000 2023-10-25 21:09:57,399 epoch 6 - iter 91/136 - loss 0.02462943 - time (sec): 6.80 - samples/sec: 4995.00 - lr: 0.000015 - momentum: 0.000000 2023-10-25 21:09:58,560 epoch 6 - iter 104/136 - loss 0.02239539 - time (sec): 7.97 - samples/sec: 4979.76 - lr: 0.000014 - momentum: 0.000000 2023-10-25 21:09:59,582 epoch 6 - iter 117/136 - loss 0.02331083 - time (sec): 8.99 - samples/sec: 4966.99 - lr: 0.000014 - momentum: 0.000000 2023-10-25 21:10:00,570 epoch 6 - iter 130/136 - loss 0.02248515 - time (sec): 9.98 - samples/sec: 4959.65 - lr: 0.000014 - momentum: 0.000000 2023-10-25 21:10:01,072 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:10:01,072 EPOCH 6 done: loss 0.0220 - lr: 0.000014 2023-10-25 21:10:02,245 DEV : loss 0.12799657881259918 - f1-score (micro avg) 0.797 2023-10-25 21:10:02,250 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:10:03,250 epoch 7 - iter 13/136 - loss 0.00665615 - time (sec): 1.00 - samples/sec: 4884.85 - lr: 0.000013 - momentum: 0.000000 2023-10-25 21:10:04,740 epoch 7 - iter 26/136 - loss 0.00791291 - time (sec): 2.49 - samples/sec: 4089.19 - lr: 0.000013 - momentum: 0.000000 2023-10-25 21:10:05,744 epoch 7 - iter 39/136 - loss 0.01077149 - time (sec): 3.49 - samples/sec: 4323.59 - lr: 0.000012 - momentum: 0.000000 2023-10-25 21:10:06,679 epoch 7 - iter 52/136 - loss 0.01133666 - time (sec): 4.43 - samples/sec: 4628.27 - lr: 0.000012 - momentum: 0.000000 2023-10-25 21:10:07,626 epoch 7 - iter 65/136 - loss 0.01753632 - time (sec): 5.37 - samples/sec: 4689.96 - lr: 0.000012 - momentum: 0.000000 2023-10-25 21:10:08,647 epoch 7 - iter 78/136 - loss 0.01760847 - time (sec): 6.40 - samples/sec: 4727.70 - lr: 0.000012 - momentum: 0.000000 2023-10-25 21:10:09,641 epoch 7 - iter 91/136 - loss 0.01675421 - time (sec): 7.39 - samples/sec: 4782.34 - lr: 0.000011 - momentum: 0.000000 2023-10-25 21:10:10,586 epoch 7 - iter 104/136 - loss 0.01711306 - time (sec): 8.33 - samples/sec: 4843.88 - lr: 0.000011 - momentum: 0.000000 2023-10-25 21:10:11,593 epoch 7 - iter 117/136 - loss 0.01794913 - time (sec): 9.34 - samples/sec: 4807.83 - lr: 0.000011 - momentum: 0.000000 2023-10-25 21:10:12,539 epoch 7 - iter 130/136 - loss 0.01874575 - time (sec): 10.29 - samples/sec: 4802.32 - lr: 0.000010 - momentum: 0.000000 2023-10-25 21:10:13,069 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:10:13,069 EPOCH 7 done: loss 0.0182 - lr: 0.000010 2023-10-25 21:10:14,321 DEV : loss 0.13486173748970032 - f1-score (micro avg) 0.8281 2023-10-25 21:10:14,329 saving best model 2023-10-25 21:10:15,082 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:10:16,039 epoch 8 - iter 13/136 - loss 0.01227910 - time (sec): 0.95 - samples/sec: 4842.54 - lr: 0.000010 - momentum: 0.000000 2023-10-25 21:10:17,204 epoch 8 - iter 26/136 - loss 0.00885702 - time (sec): 2.12 - samples/sec: 4672.53 - lr: 0.000009 - momentum: 0.000000 2023-10-25 21:10:18,179 epoch 8 - iter 39/136 - loss 0.01074911 - time (sec): 3.09 - samples/sec: 4742.89 - lr: 0.000009 - momentum: 0.000000 2023-10-25 21:10:19,131 epoch 8 - iter 52/136 - loss 0.01018039 - time (sec): 4.05 - samples/sec: 4808.91 - lr: 0.000009 - momentum: 0.000000 2023-10-25 21:10:20,185 epoch 8 - iter 65/136 - loss 0.01119058 - time (sec): 5.10 - samples/sec: 4843.21 - lr: 0.000009 - momentum: 0.000000 2023-10-25 21:10:21,283 epoch 8 - iter 78/136 - loss 0.01268907 - time (sec): 6.20 - samples/sec: 4841.75 - lr: 0.000008 - momentum: 0.000000 2023-10-25 21:10:22,441 epoch 8 - iter 91/136 - loss 0.01242028 - time (sec): 7.36 - samples/sec: 4883.72 - lr: 0.000008 - momentum: 0.000000 2023-10-25 21:10:23,625 epoch 8 - iter 104/136 - loss 0.01266673 - time (sec): 8.54 - samples/sec: 4866.30 - lr: 0.000008 - momentum: 0.000000 2023-10-25 21:10:24,634 epoch 8 - iter 117/136 - loss 0.01264248 - time (sec): 9.55 - samples/sec: 4837.20 - lr: 0.000007 - momentum: 0.000000 2023-10-25 21:10:25,520 epoch 8 - iter 130/136 - loss 0.01326842 - time (sec): 10.43 - samples/sec: 4758.37 - lr: 0.000007 - momentum: 0.000000 2023-10-25 21:10:25,914 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:10:25,914 EPOCH 8 done: loss 0.0131 - lr: 0.000007 2023-10-25 21:10:27,062 DEV : loss 0.14558616280555725 - f1-score (micro avg) 0.8147 2023-10-25 21:10:27,069 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:10:28,014 epoch 9 - iter 13/136 - loss 0.01001810 - time (sec): 0.94 - samples/sec: 5034.99 - lr: 0.000006 - momentum: 0.000000 2023-10-25 21:10:29,051 epoch 9 - iter 26/136 - loss 0.00772984 - time (sec): 1.98 - samples/sec: 4516.48 - lr: 0.000006 - momentum: 0.000000 2023-10-25 21:10:30,389 epoch 9 - iter 39/136 - loss 0.00784489 - time (sec): 3.32 - samples/sec: 4345.95 - lr: 0.000006 - momentum: 0.000000 2023-10-25 21:10:31,407 epoch 9 - iter 52/136 - loss 0.01058244 - time (sec): 4.34 - samples/sec: 4456.06 - lr: 0.000006 - momentum: 0.000000 2023-10-25 21:10:32,405 epoch 9 - iter 65/136 - loss 0.01127323 - time (sec): 5.33 - samples/sec: 4553.44 - lr: 0.000005 - momentum: 0.000000 2023-10-25 21:10:33,559 epoch 9 - iter 78/136 - loss 0.01163534 - time (sec): 6.49 - samples/sec: 4539.63 - lr: 0.000005 - momentum: 0.000000 2023-10-25 21:10:34,498 epoch 9 - iter 91/136 - loss 0.01127284 - time (sec): 7.43 - samples/sec: 4635.38 - lr: 0.000005 - momentum: 0.000000 2023-10-25 21:10:35,533 epoch 9 - iter 104/136 - loss 0.01078184 - time (sec): 8.46 - samples/sec: 4800.70 - lr: 0.000004 - momentum: 0.000000 2023-10-25 21:10:36,530 epoch 9 - iter 117/136 - loss 0.01084380 - time (sec): 9.46 - samples/sec: 4754.13 - lr: 0.000004 - momentum: 0.000000 2023-10-25 21:10:37,546 epoch 9 - iter 130/136 - loss 0.01095038 - time (sec): 10.48 - samples/sec: 4764.09 - lr: 0.000004 - momentum: 0.000000 2023-10-25 21:10:37,994 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:10:37,995 EPOCH 9 done: loss 0.0107 - lr: 0.000004 2023-10-25 21:10:39,157 DEV : loss 0.16140978038311005 - f1-score (micro avg) 0.8015 2023-10-25 21:10:39,165 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:10:40,120 epoch 10 - iter 13/136 - loss 0.00972299 - time (sec): 0.95 - samples/sec: 4572.15 - lr: 0.000003 - momentum: 0.000000 2023-10-25 21:10:41,098 epoch 10 - iter 26/136 - loss 0.01249253 - time (sec): 1.93 - samples/sec: 4720.98 - lr: 0.000003 - momentum: 0.000000 2023-10-25 21:10:42,120 epoch 10 - iter 39/136 - loss 0.00965118 - time (sec): 2.95 - samples/sec: 4811.09 - lr: 0.000003 - momentum: 0.000000 2023-10-25 21:10:43,281 epoch 10 - iter 52/136 - loss 0.00856041 - time (sec): 4.12 - samples/sec: 4799.54 - lr: 0.000002 - momentum: 0.000000 2023-10-25 21:10:44,217 epoch 10 - iter 65/136 - loss 0.00916437 - time (sec): 5.05 - samples/sec: 4773.59 - lr: 0.000002 - momentum: 0.000000 2023-10-25 21:10:45,221 epoch 10 - iter 78/136 - loss 0.00803505 - time (sec): 6.05 - samples/sec: 4773.71 - lr: 0.000002 - momentum: 0.000000 2023-10-25 21:10:46,319 epoch 10 - iter 91/136 - loss 0.00748232 - time (sec): 7.15 - samples/sec: 4860.13 - lr: 0.000001 - momentum: 0.000000 2023-10-25 21:10:47,324 epoch 10 - iter 104/136 - loss 0.00770687 - time (sec): 8.16 - samples/sec: 4835.05 - lr: 0.000001 - momentum: 0.000000 2023-10-25 21:10:48,400 epoch 10 - iter 117/136 - loss 0.00779841 - time (sec): 9.23 - samples/sec: 4852.41 - lr: 0.000001 - momentum: 0.000000 2023-10-25 21:10:49,391 epoch 10 - iter 130/136 - loss 0.00811627 - time (sec): 10.22 - samples/sec: 4853.42 - lr: 0.000000 - momentum: 0.000000 2023-10-25 21:10:49,858 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:10:49,859 EPOCH 10 done: loss 0.0078 - lr: 0.000000 2023-10-25 21:10:51,112 DEV : loss 0.1566716879606247 - f1-score (micro avg) 0.808 2023-10-25 21:10:51,662 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:10:51,664 Loading model from best epoch ... 2023-10-25 21:10:54,411 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-25 21:10:56,739 Results: - F-score (micro) 0.7933 - F-score (macro) 0.7392 - Accuracy 0.6726 By class: precision recall f1-score support LOC 0.8449 0.8558 0.8503 312 PER 0.6935 0.8702 0.7719 208 ORG 0.5870 0.4909 0.5347 55 HumanProd 0.7143 0.9091 0.8000 22 micro avg 0.7604 0.8291 0.7933 597 macro avg 0.7099 0.7815 0.7392 597 weighted avg 0.7636 0.8291 0.7920 597 2023-10-25 21:10:56,739 ----------------------------------------------------------------------------------------------------