stefan-it's picture
Upload ./training.log with huggingface_hub
fa54ea9
2023-10-25 15:07:16,252 ----------------------------------------------------------------------------------------------------
2023-10-25 15:07:16,253 Model: "SequenceTagger(
(embeddings): TransformerWordEmbeddings(
(model): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(64001, 768)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0-11): 12 x BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=768, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-25 15:07:16,253 ----------------------------------------------------------------------------------------------------
2023-10-25 15:07:16,254 MultiCorpus: 7142 train + 698 dev + 2570 test sentences
- NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator
2023-10-25 15:07:16,254 ----------------------------------------------------------------------------------------------------
2023-10-25 15:07:16,254 Train: 7142 sentences
2023-10-25 15:07:16,254 (train_with_dev=False, train_with_test=False)
2023-10-25 15:07:16,254 ----------------------------------------------------------------------------------------------------
2023-10-25 15:07:16,254 Training Params:
2023-10-25 15:07:16,254 - learning_rate: "3e-05"
2023-10-25 15:07:16,254 - mini_batch_size: "8"
2023-10-25 15:07:16,254 - max_epochs: "10"
2023-10-25 15:07:16,254 - shuffle: "True"
2023-10-25 15:07:16,254 ----------------------------------------------------------------------------------------------------
2023-10-25 15:07:16,254 Plugins:
2023-10-25 15:07:16,254 - TensorboardLogger
2023-10-25 15:07:16,254 - LinearScheduler | warmup_fraction: '0.1'
2023-10-25 15:07:16,254 ----------------------------------------------------------------------------------------------------
2023-10-25 15:07:16,254 Final evaluation on model from best epoch (best-model.pt)
2023-10-25 15:07:16,254 - metric: "('micro avg', 'f1-score')"
2023-10-25 15:07:16,254 ----------------------------------------------------------------------------------------------------
2023-10-25 15:07:16,254 Computation:
2023-10-25 15:07:16,254 - compute on device: cuda:0
2023-10-25 15:07:16,254 - embedding storage: none
2023-10-25 15:07:16,254 ----------------------------------------------------------------------------------------------------
2023-10-25 15:07:16,254 Model training base path: "hmbench-newseye/fr-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-2"
2023-10-25 15:07:16,255 ----------------------------------------------------------------------------------------------------
2023-10-25 15:07:16,255 ----------------------------------------------------------------------------------------------------
2023-10-25 15:07:16,255 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-25 15:07:22,287 epoch 1 - iter 89/893 - loss 2.32679756 - time (sec): 6.03 - samples/sec: 4228.16 - lr: 0.000003 - momentum: 0.000000
2023-10-25 15:07:27,984 epoch 1 - iter 178/893 - loss 1.51396462 - time (sec): 11.73 - samples/sec: 4163.39 - lr: 0.000006 - momentum: 0.000000
2023-10-25 15:07:33,714 epoch 1 - iter 267/893 - loss 1.14478279 - time (sec): 17.46 - samples/sec: 4138.90 - lr: 0.000009 - momentum: 0.000000
2023-10-25 15:07:39,758 epoch 1 - iter 356/893 - loss 0.92624548 - time (sec): 23.50 - samples/sec: 4118.92 - lr: 0.000012 - momentum: 0.000000
2023-10-25 15:07:45,647 epoch 1 - iter 445/893 - loss 0.77920214 - time (sec): 29.39 - samples/sec: 4149.48 - lr: 0.000015 - momentum: 0.000000
2023-10-25 15:07:51,499 epoch 1 - iter 534/893 - loss 0.67179663 - time (sec): 35.24 - samples/sec: 4208.38 - lr: 0.000018 - momentum: 0.000000
2023-10-25 15:07:57,019 epoch 1 - iter 623/893 - loss 0.60130729 - time (sec): 40.76 - samples/sec: 4247.79 - lr: 0.000021 - momentum: 0.000000
2023-10-25 15:08:02,490 epoch 1 - iter 712/893 - loss 0.54657076 - time (sec): 46.23 - samples/sec: 4282.03 - lr: 0.000024 - momentum: 0.000000
2023-10-25 15:08:07,955 epoch 1 - iter 801/893 - loss 0.50230078 - time (sec): 51.70 - samples/sec: 4306.93 - lr: 0.000027 - momentum: 0.000000
2023-10-25 15:08:13,549 epoch 1 - iter 890/893 - loss 0.46653814 - time (sec): 57.29 - samples/sec: 4330.30 - lr: 0.000030 - momentum: 0.000000
2023-10-25 15:08:13,712 ----------------------------------------------------------------------------------------------------
2023-10-25 15:08:13,712 EPOCH 1 done: loss 0.4656 - lr: 0.000030
2023-10-25 15:08:17,345 DEV : loss 0.1060444563627243 - f1-score (micro avg) 0.7387
2023-10-25 15:08:17,369 saving best model
2023-10-25 15:08:17,905 ----------------------------------------------------------------------------------------------------
2023-10-25 15:08:23,847 epoch 2 - iter 89/893 - loss 0.11786204 - time (sec): 5.94 - samples/sec: 4321.04 - lr: 0.000030 - momentum: 0.000000
2023-10-25 15:08:29,333 epoch 2 - iter 178/893 - loss 0.11830957 - time (sec): 11.43 - samples/sec: 4126.75 - lr: 0.000029 - momentum: 0.000000
2023-10-25 15:08:35,534 epoch 2 - iter 267/893 - loss 0.11117703 - time (sec): 17.63 - samples/sec: 4187.28 - lr: 0.000029 - momentum: 0.000000
2023-10-25 15:08:41,384 epoch 2 - iter 356/893 - loss 0.11088492 - time (sec): 23.48 - samples/sec: 4194.66 - lr: 0.000029 - momentum: 0.000000
2023-10-25 15:08:47,307 epoch 2 - iter 445/893 - loss 0.10753992 - time (sec): 29.40 - samples/sec: 4218.91 - lr: 0.000028 - momentum: 0.000000
2023-10-25 15:08:53,058 epoch 2 - iter 534/893 - loss 0.10789428 - time (sec): 35.15 - samples/sec: 4232.20 - lr: 0.000028 - momentum: 0.000000
2023-10-25 15:08:58,673 epoch 2 - iter 623/893 - loss 0.10558977 - time (sec): 40.77 - samples/sec: 4289.97 - lr: 0.000028 - momentum: 0.000000
2023-10-25 15:09:04,088 epoch 2 - iter 712/893 - loss 0.10375529 - time (sec): 46.18 - samples/sec: 4271.55 - lr: 0.000027 - momentum: 0.000000
2023-10-25 15:09:09,613 epoch 2 - iter 801/893 - loss 0.10372405 - time (sec): 51.71 - samples/sec: 4305.77 - lr: 0.000027 - momentum: 0.000000
2023-10-25 15:09:15,240 epoch 2 - iter 890/893 - loss 0.10324515 - time (sec): 57.33 - samples/sec: 4321.41 - lr: 0.000027 - momentum: 0.000000
2023-10-25 15:09:15,429 ----------------------------------------------------------------------------------------------------
2023-10-25 15:09:15,430 EPOCH 2 done: loss 0.1031 - lr: 0.000027
2023-10-25 15:09:20,245 DEV : loss 0.09593858569860458 - f1-score (micro avg) 0.777
2023-10-25 15:09:20,268 saving best model
2023-10-25 15:09:20,917 ----------------------------------------------------------------------------------------------------
2023-10-25 15:09:26,476 epoch 3 - iter 89/893 - loss 0.06107853 - time (sec): 5.56 - samples/sec: 4578.32 - lr: 0.000026 - momentum: 0.000000
2023-10-25 15:09:31,970 epoch 3 - iter 178/893 - loss 0.05655927 - time (sec): 11.05 - samples/sec: 4428.83 - lr: 0.000026 - momentum: 0.000000
2023-10-25 15:09:37,621 epoch 3 - iter 267/893 - loss 0.05765556 - time (sec): 16.70 - samples/sec: 4489.95 - lr: 0.000026 - momentum: 0.000000
2023-10-25 15:09:43,134 epoch 3 - iter 356/893 - loss 0.05899019 - time (sec): 22.22 - samples/sec: 4487.90 - lr: 0.000025 - momentum: 0.000000
2023-10-25 15:09:48,684 epoch 3 - iter 445/893 - loss 0.06059285 - time (sec): 27.77 - samples/sec: 4430.84 - lr: 0.000025 - momentum: 0.000000
2023-10-25 15:09:54,437 epoch 3 - iter 534/893 - loss 0.06203883 - time (sec): 33.52 - samples/sec: 4406.62 - lr: 0.000025 - momentum: 0.000000
2023-10-25 15:10:00,394 epoch 3 - iter 623/893 - loss 0.06228959 - time (sec): 39.48 - samples/sec: 4403.41 - lr: 0.000024 - momentum: 0.000000
2023-10-25 15:10:06,271 epoch 3 - iter 712/893 - loss 0.06223592 - time (sec): 45.35 - samples/sec: 4403.81 - lr: 0.000024 - momentum: 0.000000
2023-10-25 15:10:12,116 epoch 3 - iter 801/893 - loss 0.06176464 - time (sec): 51.20 - samples/sec: 4397.44 - lr: 0.000024 - momentum: 0.000000
2023-10-25 15:10:17,781 epoch 3 - iter 890/893 - loss 0.06140542 - time (sec): 56.86 - samples/sec: 4364.69 - lr: 0.000023 - momentum: 0.000000
2023-10-25 15:10:17,965 ----------------------------------------------------------------------------------------------------
2023-10-25 15:10:17,965 EPOCH 3 done: loss 0.0613 - lr: 0.000023
2023-10-25 15:10:22,849 DEV : loss 0.10392870754003525 - f1-score (micro avg) 0.7824
2023-10-25 15:10:22,870 saving best model
2023-10-25 15:10:23,572 ----------------------------------------------------------------------------------------------------
2023-10-25 15:10:29,407 epoch 4 - iter 89/893 - loss 0.04410301 - time (sec): 5.83 - samples/sec: 4278.04 - lr: 0.000023 - momentum: 0.000000
2023-10-25 15:10:35,183 epoch 4 - iter 178/893 - loss 0.04544728 - time (sec): 11.61 - samples/sec: 4301.00 - lr: 0.000023 - momentum: 0.000000
2023-10-25 15:10:40,777 epoch 4 - iter 267/893 - loss 0.04597693 - time (sec): 17.20 - samples/sec: 4268.80 - lr: 0.000022 - momentum: 0.000000
2023-10-25 15:10:46,358 epoch 4 - iter 356/893 - loss 0.04537082 - time (sec): 22.78 - samples/sec: 4353.99 - lr: 0.000022 - momentum: 0.000000
2023-10-25 15:10:52,237 epoch 4 - iter 445/893 - loss 0.04624815 - time (sec): 28.66 - samples/sec: 4335.26 - lr: 0.000022 - momentum: 0.000000
2023-10-25 15:10:58,294 epoch 4 - iter 534/893 - loss 0.04475990 - time (sec): 34.72 - samples/sec: 4348.02 - lr: 0.000021 - momentum: 0.000000
2023-10-25 15:11:04,125 epoch 4 - iter 623/893 - loss 0.04559760 - time (sec): 40.55 - samples/sec: 4316.02 - lr: 0.000021 - momentum: 0.000000
2023-10-25 15:11:09,950 epoch 4 - iter 712/893 - loss 0.04461195 - time (sec): 46.38 - samples/sec: 4271.43 - lr: 0.000021 - momentum: 0.000000
2023-10-25 15:11:15,961 epoch 4 - iter 801/893 - loss 0.04371497 - time (sec): 52.39 - samples/sec: 4285.47 - lr: 0.000020 - momentum: 0.000000
2023-10-25 15:11:21,628 epoch 4 - iter 890/893 - loss 0.04355757 - time (sec): 58.05 - samples/sec: 4275.23 - lr: 0.000020 - momentum: 0.000000
2023-10-25 15:11:21,806 ----------------------------------------------------------------------------------------------------
2023-10-25 15:11:21,807 EPOCH 4 done: loss 0.0437 - lr: 0.000020
2023-10-25 15:11:25,871 DEV : loss 0.1405394971370697 - f1-score (micro avg) 0.7739
2023-10-25 15:11:25,895 ----------------------------------------------------------------------------------------------------
2023-10-25 15:11:31,806 epoch 5 - iter 89/893 - loss 0.03086483 - time (sec): 5.91 - samples/sec: 4018.56 - lr: 0.000020 - momentum: 0.000000
2023-10-25 15:11:37,763 epoch 5 - iter 178/893 - loss 0.03408901 - time (sec): 11.87 - samples/sec: 4169.34 - lr: 0.000019 - momentum: 0.000000
2023-10-25 15:11:43,934 epoch 5 - iter 267/893 - loss 0.03368610 - time (sec): 18.04 - samples/sec: 4168.56 - lr: 0.000019 - momentum: 0.000000
2023-10-25 15:11:49,742 epoch 5 - iter 356/893 - loss 0.03342150 - time (sec): 23.84 - samples/sec: 4171.56 - lr: 0.000019 - momentum: 0.000000
2023-10-25 15:11:55,551 epoch 5 - iter 445/893 - loss 0.03366136 - time (sec): 29.65 - samples/sec: 4169.40 - lr: 0.000018 - momentum: 0.000000
2023-10-25 15:12:01,573 epoch 5 - iter 534/893 - loss 0.03390282 - time (sec): 35.68 - samples/sec: 4202.78 - lr: 0.000018 - momentum: 0.000000
2023-10-25 15:12:07,368 epoch 5 - iter 623/893 - loss 0.03357706 - time (sec): 41.47 - samples/sec: 4197.80 - lr: 0.000018 - momentum: 0.000000
2023-10-25 15:12:13,096 epoch 5 - iter 712/893 - loss 0.03266215 - time (sec): 47.20 - samples/sec: 4199.37 - lr: 0.000017 - momentum: 0.000000
2023-10-25 15:12:18,874 epoch 5 - iter 801/893 - loss 0.03391376 - time (sec): 52.98 - samples/sec: 4207.00 - lr: 0.000017 - momentum: 0.000000
2023-10-25 15:12:24,683 epoch 5 - iter 890/893 - loss 0.03419362 - time (sec): 58.79 - samples/sec: 4219.53 - lr: 0.000017 - momentum: 0.000000
2023-10-25 15:12:24,885 ----------------------------------------------------------------------------------------------------
2023-10-25 15:12:24,885 EPOCH 5 done: loss 0.0341 - lr: 0.000017
2023-10-25 15:12:29,919 DEV : loss 0.16618064045906067 - f1-score (micro avg) 0.8051
2023-10-25 15:12:29,940 saving best model
2023-10-25 15:12:30,620 ----------------------------------------------------------------------------------------------------
2023-10-25 15:12:36,386 epoch 6 - iter 89/893 - loss 0.01934906 - time (sec): 5.76 - samples/sec: 4367.06 - lr: 0.000016 - momentum: 0.000000
2023-10-25 15:12:42,034 epoch 6 - iter 178/893 - loss 0.01971571 - time (sec): 11.41 - samples/sec: 4277.35 - lr: 0.000016 - momentum: 0.000000
2023-10-25 15:12:48,099 epoch 6 - iter 267/893 - loss 0.01891099 - time (sec): 17.48 - samples/sec: 4240.69 - lr: 0.000016 - momentum: 0.000000
2023-10-25 15:12:54,052 epoch 6 - iter 356/893 - loss 0.02463843 - time (sec): 23.43 - samples/sec: 4232.23 - lr: 0.000015 - momentum: 0.000000
2023-10-25 15:12:59,977 epoch 6 - iter 445/893 - loss 0.02485032 - time (sec): 29.35 - samples/sec: 4266.10 - lr: 0.000015 - momentum: 0.000000
2023-10-25 15:13:05,810 epoch 6 - iter 534/893 - loss 0.02602922 - time (sec): 35.19 - samples/sec: 4217.86 - lr: 0.000015 - momentum: 0.000000
2023-10-25 15:13:11,965 epoch 6 - iter 623/893 - loss 0.02550398 - time (sec): 41.34 - samples/sec: 4206.73 - lr: 0.000014 - momentum: 0.000000
2023-10-25 15:13:17,850 epoch 6 - iter 712/893 - loss 0.02490406 - time (sec): 47.23 - samples/sec: 4219.72 - lr: 0.000014 - momentum: 0.000000
2023-10-25 15:13:23,687 epoch 6 - iter 801/893 - loss 0.02516364 - time (sec): 53.06 - samples/sec: 4220.25 - lr: 0.000014 - momentum: 0.000000
2023-10-25 15:13:29,433 epoch 6 - iter 890/893 - loss 0.02571510 - time (sec): 58.81 - samples/sec: 4220.50 - lr: 0.000013 - momentum: 0.000000
2023-10-25 15:13:29,621 ----------------------------------------------------------------------------------------------------
2023-10-25 15:13:29,622 EPOCH 6 done: loss 0.0258 - lr: 0.000013
2023-10-25 15:13:34,396 DEV : loss 0.17371046543121338 - f1-score (micro avg) 0.8112
2023-10-25 15:13:34,417 saving best model
2023-10-25 15:13:35,072 ----------------------------------------------------------------------------------------------------
2023-10-25 15:13:41,193 epoch 7 - iter 89/893 - loss 0.02011470 - time (sec): 6.12 - samples/sec: 4355.79 - lr: 0.000013 - momentum: 0.000000
2023-10-25 15:13:47,009 epoch 7 - iter 178/893 - loss 0.02325248 - time (sec): 11.94 - samples/sec: 4245.96 - lr: 0.000013 - momentum: 0.000000
2023-10-25 15:13:52,906 epoch 7 - iter 267/893 - loss 0.02032808 - time (sec): 17.83 - samples/sec: 4206.61 - lr: 0.000012 - momentum: 0.000000
2023-10-25 15:13:58,744 epoch 7 - iter 356/893 - loss 0.02038788 - time (sec): 23.67 - samples/sec: 4235.40 - lr: 0.000012 - momentum: 0.000000
2023-10-25 15:14:04,629 epoch 7 - iter 445/893 - loss 0.01997941 - time (sec): 29.56 - samples/sec: 4217.35 - lr: 0.000012 - momentum: 0.000000
2023-10-25 15:14:10,590 epoch 7 - iter 534/893 - loss 0.01925592 - time (sec): 35.52 - samples/sec: 4185.73 - lr: 0.000011 - momentum: 0.000000
2023-10-25 15:14:16,316 epoch 7 - iter 623/893 - loss 0.01978486 - time (sec): 41.24 - samples/sec: 4162.69 - lr: 0.000011 - momentum: 0.000000
2023-10-25 15:14:22,572 epoch 7 - iter 712/893 - loss 0.01991371 - time (sec): 47.50 - samples/sec: 4168.20 - lr: 0.000011 - momentum: 0.000000
2023-10-25 15:14:28,265 epoch 7 - iter 801/893 - loss 0.01994353 - time (sec): 53.19 - samples/sec: 4181.97 - lr: 0.000010 - momentum: 0.000000
2023-10-25 15:14:34,108 epoch 7 - iter 890/893 - loss 0.02008156 - time (sec): 59.03 - samples/sec: 4202.26 - lr: 0.000010 - momentum: 0.000000
2023-10-25 15:14:34,306 ----------------------------------------------------------------------------------------------------
2023-10-25 15:14:34,306 EPOCH 7 done: loss 0.0200 - lr: 0.000010
2023-10-25 15:14:38,320 DEV : loss 0.17937816679477692 - f1-score (micro avg) 0.8123
2023-10-25 15:14:38,342 saving best model
2023-10-25 15:14:39,015 ----------------------------------------------------------------------------------------------------
2023-10-25 15:14:44,770 epoch 8 - iter 89/893 - loss 0.01881002 - time (sec): 5.75 - samples/sec: 4163.23 - lr: 0.000010 - momentum: 0.000000
2023-10-25 15:14:50,519 epoch 8 - iter 178/893 - loss 0.01785540 - time (sec): 11.50 - samples/sec: 4223.67 - lr: 0.000009 - momentum: 0.000000
2023-10-25 15:14:56,466 epoch 8 - iter 267/893 - loss 0.01663373 - time (sec): 17.45 - samples/sec: 4239.61 - lr: 0.000009 - momentum: 0.000000
2023-10-25 15:15:02,200 epoch 8 - iter 356/893 - loss 0.01641502 - time (sec): 23.18 - samples/sec: 4212.11 - lr: 0.000009 - momentum: 0.000000
2023-10-25 15:15:08,033 epoch 8 - iter 445/893 - loss 0.01638939 - time (sec): 29.02 - samples/sec: 4217.12 - lr: 0.000008 - momentum: 0.000000
2023-10-25 15:15:14,046 epoch 8 - iter 534/893 - loss 0.01532943 - time (sec): 35.03 - samples/sec: 4218.47 - lr: 0.000008 - momentum: 0.000000
2023-10-25 15:15:20,415 epoch 8 - iter 623/893 - loss 0.01502360 - time (sec): 41.40 - samples/sec: 4207.79 - lr: 0.000008 - momentum: 0.000000
2023-10-25 15:15:26,296 epoch 8 - iter 712/893 - loss 0.01549364 - time (sec): 47.28 - samples/sec: 4180.53 - lr: 0.000007 - momentum: 0.000000
2023-10-25 15:15:32,060 epoch 8 - iter 801/893 - loss 0.01600174 - time (sec): 53.04 - samples/sec: 4193.62 - lr: 0.000007 - momentum: 0.000000
2023-10-25 15:15:38,072 epoch 8 - iter 890/893 - loss 0.01593321 - time (sec): 59.06 - samples/sec: 4200.02 - lr: 0.000007 - momentum: 0.000000
2023-10-25 15:15:38,267 ----------------------------------------------------------------------------------------------------
2023-10-25 15:15:38,267 EPOCH 8 done: loss 0.0159 - lr: 0.000007
2023-10-25 15:15:43,263 DEV : loss 0.21227356791496277 - f1-score (micro avg) 0.7971
2023-10-25 15:15:43,284 ----------------------------------------------------------------------------------------------------
2023-10-25 15:15:49,118 epoch 9 - iter 89/893 - loss 0.00920994 - time (sec): 5.83 - samples/sec: 4231.87 - lr: 0.000006 - momentum: 0.000000
2023-10-25 15:15:54,872 epoch 9 - iter 178/893 - loss 0.00891932 - time (sec): 11.59 - samples/sec: 4226.85 - lr: 0.000006 - momentum: 0.000000
2023-10-25 15:16:00,508 epoch 9 - iter 267/893 - loss 0.01039436 - time (sec): 17.22 - samples/sec: 4278.15 - lr: 0.000006 - momentum: 0.000000
2023-10-25 15:16:06,782 epoch 9 - iter 356/893 - loss 0.01056567 - time (sec): 23.50 - samples/sec: 4283.75 - lr: 0.000005 - momentum: 0.000000
2023-10-25 15:16:12,664 epoch 9 - iter 445/893 - loss 0.01205154 - time (sec): 29.38 - samples/sec: 4283.65 - lr: 0.000005 - momentum: 0.000000
2023-10-25 15:16:18,609 epoch 9 - iter 534/893 - loss 0.01179973 - time (sec): 35.32 - samples/sec: 4285.62 - lr: 0.000005 - momentum: 0.000000
2023-10-25 15:16:24,456 epoch 9 - iter 623/893 - loss 0.01148151 - time (sec): 41.17 - samples/sec: 4253.07 - lr: 0.000004 - momentum: 0.000000
2023-10-25 15:16:30,157 epoch 9 - iter 712/893 - loss 0.01108116 - time (sec): 46.87 - samples/sec: 4279.34 - lr: 0.000004 - momentum: 0.000000
2023-10-25 15:16:35,747 epoch 9 - iter 801/893 - loss 0.01083303 - time (sec): 52.46 - samples/sec: 4280.96 - lr: 0.000004 - momentum: 0.000000
2023-10-25 15:16:41,190 epoch 9 - iter 890/893 - loss 0.01082633 - time (sec): 57.90 - samples/sec: 4284.21 - lr: 0.000003 - momentum: 0.000000
2023-10-25 15:16:41,365 ----------------------------------------------------------------------------------------------------
2023-10-25 15:16:41,366 EPOCH 9 done: loss 0.0108 - lr: 0.000003
2023-10-25 15:16:46,208 DEV : loss 0.21176157891750336 - f1-score (micro avg) 0.8104
2023-10-25 15:16:46,230 ----------------------------------------------------------------------------------------------------
2023-10-25 15:16:51,777 epoch 10 - iter 89/893 - loss 0.01317723 - time (sec): 5.55 - samples/sec: 4308.65 - lr: 0.000003 - momentum: 0.000000
2023-10-25 15:16:57,614 epoch 10 - iter 178/893 - loss 0.00887501 - time (sec): 11.38 - samples/sec: 4360.90 - lr: 0.000003 - momentum: 0.000000
2023-10-25 15:17:03,568 epoch 10 - iter 267/893 - loss 0.00690912 - time (sec): 17.34 - samples/sec: 4311.16 - lr: 0.000002 - momentum: 0.000000
2023-10-25 15:17:09,586 epoch 10 - iter 356/893 - loss 0.00707851 - time (sec): 23.35 - samples/sec: 4278.17 - lr: 0.000002 - momentum: 0.000000
2023-10-25 15:17:15,125 epoch 10 - iter 445/893 - loss 0.00657208 - time (sec): 28.89 - samples/sec: 4225.74 - lr: 0.000002 - momentum: 0.000000
2023-10-25 15:17:20,950 epoch 10 - iter 534/893 - loss 0.00656213 - time (sec): 34.72 - samples/sec: 4263.47 - lr: 0.000001 - momentum: 0.000000
2023-10-25 15:17:26,715 epoch 10 - iter 623/893 - loss 0.00748677 - time (sec): 40.48 - samples/sec: 4268.60 - lr: 0.000001 - momentum: 0.000000
2023-10-25 15:17:32,254 epoch 10 - iter 712/893 - loss 0.00756648 - time (sec): 46.02 - samples/sec: 4296.21 - lr: 0.000001 - momentum: 0.000000
2023-10-25 15:17:37,995 epoch 10 - iter 801/893 - loss 0.00771944 - time (sec): 51.76 - samples/sec: 4279.89 - lr: 0.000000 - momentum: 0.000000
2023-10-25 15:17:43,827 epoch 10 - iter 890/893 - loss 0.00812425 - time (sec): 57.60 - samples/sec: 4307.35 - lr: 0.000000 - momentum: 0.000000
2023-10-25 15:17:43,999 ----------------------------------------------------------------------------------------------------
2023-10-25 15:17:43,999 EPOCH 10 done: loss 0.0081 - lr: 0.000000
2023-10-25 15:17:47,978 DEV : loss 0.21720275282859802 - f1-score (micro avg) 0.8147
2023-10-25 15:17:47,999 saving best model
2023-10-25 15:17:49,119 ----------------------------------------------------------------------------------------------------
2023-10-25 15:17:49,120 Loading model from best epoch ...
2023-10-25 15:17:51,035 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-25 15:18:03,575
Results:
- F-score (micro) 0.6992
- F-score (macro) 0.6245
- Accuracy 0.5561
By class:
precision recall f1-score support
LOC 0.7038 0.6986 0.7012 1095
PER 0.7808 0.7816 0.7812 1012
ORG 0.4549 0.5798 0.5099 357
HumanProd 0.4074 0.6667 0.5057 33
micro avg 0.6842 0.7149 0.6992 2497
macro avg 0.5867 0.6817 0.6245 2497
weighted avg 0.6955 0.7149 0.7037 2497
2023-10-25 15:18:03,576 ----------------------------------------------------------------------------------------------------