stefan-it commited on
Commit
a9e8dc0
1 Parent(s): 52f8df6

Upload ./training.log with huggingface_hub

Browse files
Files changed (1) hide show
  1. training.log +242 -0
training.log ADDED
@@ -0,0 +1,242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-25 21:32:57,504 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-25 21:32:57,504 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): BertModel(
5
+ (embeddings): BertEmbeddings(
6
+ (word_embeddings): Embedding(64001, 768)
7
+ (position_embeddings): Embedding(512, 768)
8
+ (token_type_embeddings): Embedding(2, 768)
9
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): BertEncoder(
13
+ (layer): ModuleList(
14
+ (0-11): 12 x BertLayer(
15
+ (attention): BertAttention(
16
+ (self): BertSelfAttention(
17
+ (query): Linear(in_features=768, out_features=768, bias=True)
18
+ (key): Linear(in_features=768, out_features=768, bias=True)
19
+ (value): Linear(in_features=768, out_features=768, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): BertSelfOutput(
23
+ (dense): Linear(in_features=768, out_features=768, bias=True)
24
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): BertIntermediate(
29
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
30
+ (intermediate_act_fn): GELUActivation()
31
+ )
32
+ (output): BertOutput(
33
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
34
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
35
+ (dropout): Dropout(p=0.1, inplace=False)
36
+ )
37
+ )
38
+ )
39
+ )
40
+ (pooler): BertPooler(
41
+ (dense): Linear(in_features=768, out_features=768, bias=True)
42
+ (activation): Tanh()
43
+ )
44
+ )
45
+ )
46
+ (locked_dropout): LockedDropout(p=0.5)
47
+ (linear): Linear(in_features=768, out_features=17, bias=True)
48
+ (loss_function): CrossEntropyLoss()
49
+ )"
50
+ 2023-10-25 21:32:57,505 ----------------------------------------------------------------------------------------------------
51
+ 2023-10-25 21:32:57,505 MultiCorpus: 1166 train + 165 dev + 415 test sentences
52
+ - NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator
53
+ 2023-10-25 21:32:57,505 ----------------------------------------------------------------------------------------------------
54
+ 2023-10-25 21:32:57,505 Train: 1166 sentences
55
+ 2023-10-25 21:32:57,505 (train_with_dev=False, train_with_test=False)
56
+ 2023-10-25 21:32:57,505 ----------------------------------------------------------------------------------------------------
57
+ 2023-10-25 21:32:57,505 Training Params:
58
+ 2023-10-25 21:32:57,505 - learning_rate: "5e-05"
59
+ 2023-10-25 21:32:57,505 - mini_batch_size: "4"
60
+ 2023-10-25 21:32:57,505 - max_epochs: "10"
61
+ 2023-10-25 21:32:57,505 - shuffle: "True"
62
+ 2023-10-25 21:32:57,505 ----------------------------------------------------------------------------------------------------
63
+ 2023-10-25 21:32:57,505 Plugins:
64
+ 2023-10-25 21:32:57,505 - TensorboardLogger
65
+ 2023-10-25 21:32:57,505 - LinearScheduler | warmup_fraction: '0.1'
66
+ 2023-10-25 21:32:57,505 ----------------------------------------------------------------------------------------------------
67
+ 2023-10-25 21:32:57,505 Final evaluation on model from best epoch (best-model.pt)
68
+ 2023-10-25 21:32:57,505 - metric: "('micro avg', 'f1-score')"
69
+ 2023-10-25 21:32:57,505 ----------------------------------------------------------------------------------------------------
70
+ 2023-10-25 21:32:57,505 Computation:
71
+ 2023-10-25 21:32:57,505 - compute on device: cuda:0
72
+ 2023-10-25 21:32:57,505 - embedding storage: none
73
+ 2023-10-25 21:32:57,505 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-25 21:32:57,505 Model training base path: "hmbench-newseye/fi-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs4-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-5"
75
+ 2023-10-25 21:32:57,505 ----------------------------------------------------------------------------------------------------
76
+ 2023-10-25 21:32:57,505 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-25 21:32:57,505 Logging anything other than scalars to TensorBoard is currently not supported.
78
+ 2023-10-25 21:32:58,806 epoch 1 - iter 29/292 - loss 2.45137787 - time (sec): 1.30 - samples/sec: 3120.17 - lr: 0.000005 - momentum: 0.000000
79
+ 2023-10-25 21:33:00,122 epoch 1 - iter 58/292 - loss 1.70444850 - time (sec): 2.62 - samples/sec: 2993.17 - lr: 0.000010 - momentum: 0.000000
80
+ 2023-10-25 21:33:01,467 epoch 1 - iter 87/292 - loss 1.36569673 - time (sec): 3.96 - samples/sec: 3159.08 - lr: 0.000015 - momentum: 0.000000
81
+ 2023-10-25 21:33:02,798 epoch 1 - iter 116/292 - loss 1.13542738 - time (sec): 5.29 - samples/sec: 3233.59 - lr: 0.000020 - momentum: 0.000000
82
+ 2023-10-25 21:33:04,065 epoch 1 - iter 145/292 - loss 0.95947046 - time (sec): 6.56 - samples/sec: 3311.90 - lr: 0.000025 - momentum: 0.000000
83
+ 2023-10-25 21:33:05,349 epoch 1 - iter 174/292 - loss 0.86059002 - time (sec): 7.84 - samples/sec: 3267.81 - lr: 0.000030 - momentum: 0.000000
84
+ 2023-10-25 21:33:06,715 epoch 1 - iter 203/292 - loss 0.75774356 - time (sec): 9.21 - samples/sec: 3338.69 - lr: 0.000035 - momentum: 0.000000
85
+ 2023-10-25 21:33:08,059 epoch 1 - iter 232/292 - loss 0.68013281 - time (sec): 10.55 - samples/sec: 3389.51 - lr: 0.000040 - momentum: 0.000000
86
+ 2023-10-25 21:33:09,369 epoch 1 - iter 261/292 - loss 0.63213092 - time (sec): 11.86 - samples/sec: 3397.16 - lr: 0.000045 - momentum: 0.000000
87
+ 2023-10-25 21:33:10,649 epoch 1 - iter 290/292 - loss 0.59913023 - time (sec): 13.14 - samples/sec: 3363.80 - lr: 0.000049 - momentum: 0.000000
88
+ 2023-10-25 21:33:10,733 ----------------------------------------------------------------------------------------------------
89
+ 2023-10-25 21:33:10,733 EPOCH 1 done: loss 0.5985 - lr: 0.000049
90
+ 2023-10-25 21:33:11,405 DEV : loss 0.12858878076076508 - f1-score (micro avg) 0.6058
91
+ 2023-10-25 21:33:11,409 saving best model
92
+ 2023-10-25 21:33:11,920 ----------------------------------------------------------------------------------------------------
93
+ 2023-10-25 21:33:13,209 epoch 2 - iter 29/292 - loss 0.15207670 - time (sec): 1.29 - samples/sec: 3462.10 - lr: 0.000049 - momentum: 0.000000
94
+ 2023-10-25 21:33:14,531 epoch 2 - iter 58/292 - loss 0.17254937 - time (sec): 2.61 - samples/sec: 3369.96 - lr: 0.000049 - momentum: 0.000000
95
+ 2023-10-25 21:33:15,837 epoch 2 - iter 87/292 - loss 0.16515785 - time (sec): 3.92 - samples/sec: 3320.56 - lr: 0.000048 - momentum: 0.000000
96
+ 2023-10-25 21:33:17,171 epoch 2 - iter 116/292 - loss 0.15565784 - time (sec): 5.25 - samples/sec: 3367.25 - lr: 0.000048 - momentum: 0.000000
97
+ 2023-10-25 21:33:18,426 epoch 2 - iter 145/292 - loss 0.15034919 - time (sec): 6.51 - samples/sec: 3346.07 - lr: 0.000047 - momentum: 0.000000
98
+ 2023-10-25 21:33:19,663 epoch 2 - iter 174/292 - loss 0.15226906 - time (sec): 7.74 - samples/sec: 3387.64 - lr: 0.000047 - momentum: 0.000000
99
+ 2023-10-25 21:33:20,920 epoch 2 - iter 203/292 - loss 0.15082330 - time (sec): 9.00 - samples/sec: 3375.79 - lr: 0.000046 - momentum: 0.000000
100
+ 2023-10-25 21:33:22,195 epoch 2 - iter 232/292 - loss 0.15194885 - time (sec): 10.27 - samples/sec: 3383.93 - lr: 0.000046 - momentum: 0.000000
101
+ 2023-10-25 21:33:23,556 epoch 2 - iter 261/292 - loss 0.15456539 - time (sec): 11.63 - samples/sec: 3380.94 - lr: 0.000045 - momentum: 0.000000
102
+ 2023-10-25 21:33:24,901 epoch 2 - iter 290/292 - loss 0.14927010 - time (sec): 12.98 - samples/sec: 3393.23 - lr: 0.000045 - momentum: 0.000000
103
+ 2023-10-25 21:33:24,981 ----------------------------------------------------------------------------------------------------
104
+ 2023-10-25 21:33:24,981 EPOCH 2 done: loss 0.1483 - lr: 0.000045
105
+ 2023-10-25 21:33:25,896 DEV : loss 0.1477406919002533 - f1-score (micro avg) 0.6391
106
+ 2023-10-25 21:33:25,900 saving best model
107
+ 2023-10-25 21:33:26,570 ----------------------------------------------------------------------------------------------------
108
+ 2023-10-25 21:33:28,092 epoch 3 - iter 29/292 - loss 0.09222681 - time (sec): 1.52 - samples/sec: 3853.44 - lr: 0.000044 - momentum: 0.000000
109
+ 2023-10-25 21:33:29,401 epoch 3 - iter 58/292 - loss 0.09581570 - time (sec): 2.83 - samples/sec: 3688.13 - lr: 0.000043 - momentum: 0.000000
110
+ 2023-10-25 21:33:30,640 epoch 3 - iter 87/292 - loss 0.09699714 - time (sec): 4.07 - samples/sec: 3597.96 - lr: 0.000043 - momentum: 0.000000
111
+ 2023-10-25 21:33:31,923 epoch 3 - iter 116/292 - loss 0.09432914 - time (sec): 5.35 - samples/sec: 3493.39 - lr: 0.000042 - momentum: 0.000000
112
+ 2023-10-25 21:33:33,252 epoch 3 - iter 145/292 - loss 0.09100948 - time (sec): 6.68 - samples/sec: 3462.50 - lr: 0.000042 - momentum: 0.000000
113
+ 2023-10-25 21:33:34,541 epoch 3 - iter 174/292 - loss 0.08813279 - time (sec): 7.97 - samples/sec: 3403.92 - lr: 0.000041 - momentum: 0.000000
114
+ 2023-10-25 21:33:35,854 epoch 3 - iter 203/292 - loss 0.08542829 - time (sec): 9.28 - samples/sec: 3409.01 - lr: 0.000041 - momentum: 0.000000
115
+ 2023-10-25 21:33:37,116 epoch 3 - iter 232/292 - loss 0.08503915 - time (sec): 10.54 - samples/sec: 3326.93 - lr: 0.000040 - momentum: 0.000000
116
+ 2023-10-25 21:33:38,435 epoch 3 - iter 261/292 - loss 0.08523882 - time (sec): 11.86 - samples/sec: 3379.59 - lr: 0.000040 - momentum: 0.000000
117
+ 2023-10-25 21:33:39,681 epoch 3 - iter 290/292 - loss 0.08483685 - time (sec): 13.11 - samples/sec: 3373.62 - lr: 0.000039 - momentum: 0.000000
118
+ 2023-10-25 21:33:39,767 ----------------------------------------------------------------------------------------------------
119
+ 2023-10-25 21:33:39,768 EPOCH 3 done: loss 0.0858 - lr: 0.000039
120
+ 2023-10-25 21:33:40,839 DEV : loss 0.1286671906709671 - f1-score (micro avg) 0.7152
121
+ 2023-10-25 21:33:40,843 saving best model
122
+ 2023-10-25 21:33:41,512 ----------------------------------------------------------------------------------------------------
123
+ 2023-10-25 21:33:42,842 epoch 4 - iter 29/292 - loss 0.06579822 - time (sec): 1.33 - samples/sec: 3197.68 - lr: 0.000038 - momentum: 0.000000
124
+ 2023-10-25 21:33:44,181 epoch 4 - iter 58/292 - loss 0.05383181 - time (sec): 2.67 - samples/sec: 3158.85 - lr: 0.000038 - momentum: 0.000000
125
+ 2023-10-25 21:33:45,500 epoch 4 - iter 87/292 - loss 0.04876507 - time (sec): 3.98 - samples/sec: 3274.58 - lr: 0.000037 - momentum: 0.000000
126
+ 2023-10-25 21:33:46,811 epoch 4 - iter 116/292 - loss 0.04823903 - time (sec): 5.29 - samples/sec: 3210.16 - lr: 0.000037 - momentum: 0.000000
127
+ 2023-10-25 21:33:48,194 epoch 4 - iter 145/292 - loss 0.05376063 - time (sec): 6.68 - samples/sec: 3382.61 - lr: 0.000036 - momentum: 0.000000
128
+ 2023-10-25 21:33:49,494 epoch 4 - iter 174/292 - loss 0.05346591 - time (sec): 7.98 - samples/sec: 3433.92 - lr: 0.000036 - momentum: 0.000000
129
+ 2023-10-25 21:33:50,743 epoch 4 - iter 203/292 - loss 0.05630322 - time (sec): 9.23 - samples/sec: 3443.19 - lr: 0.000035 - momentum: 0.000000
130
+ 2023-10-25 21:33:52,047 epoch 4 - iter 232/292 - loss 0.05797129 - time (sec): 10.53 - samples/sec: 3396.68 - lr: 0.000035 - momentum: 0.000000
131
+ 2023-10-25 21:33:53,474 epoch 4 - iter 261/292 - loss 0.05843650 - time (sec): 11.96 - samples/sec: 3381.69 - lr: 0.000034 - momentum: 0.000000
132
+ 2023-10-25 21:33:54,735 epoch 4 - iter 290/292 - loss 0.05750520 - time (sec): 13.22 - samples/sec: 3343.54 - lr: 0.000033 - momentum: 0.000000
133
+ 2023-10-25 21:33:54,814 ----------------------------------------------------------------------------------------------------
134
+ 2023-10-25 21:33:54,814 EPOCH 4 done: loss 0.0572 - lr: 0.000033
135
+ 2023-10-25 21:33:55,722 DEV : loss 0.1722760796546936 - f1-score (micro avg) 0.7025
136
+ 2023-10-25 21:33:55,726 ----------------------------------------------------------------------------------------------------
137
+ 2023-10-25 21:33:56,988 epoch 5 - iter 29/292 - loss 0.04342491 - time (sec): 1.26 - samples/sec: 3606.54 - lr: 0.000033 - momentum: 0.000000
138
+ 2023-10-25 21:33:58,245 epoch 5 - iter 58/292 - loss 0.04319897 - time (sec): 2.52 - samples/sec: 3469.22 - lr: 0.000032 - momentum: 0.000000
139
+ 2023-10-25 21:33:59,571 epoch 5 - iter 87/292 - loss 0.03827778 - time (sec): 3.84 - samples/sec: 3389.04 - lr: 0.000032 - momentum: 0.000000
140
+ 2023-10-25 21:34:00,840 epoch 5 - iter 116/292 - loss 0.03386059 - time (sec): 5.11 - samples/sec: 3404.01 - lr: 0.000031 - momentum: 0.000000
141
+ 2023-10-25 21:34:02,129 epoch 5 - iter 145/292 - loss 0.03593760 - time (sec): 6.40 - samples/sec: 3414.83 - lr: 0.000031 - momentum: 0.000000
142
+ 2023-10-25 21:34:03,401 epoch 5 - iter 174/292 - loss 0.03846183 - time (sec): 7.67 - samples/sec: 3360.64 - lr: 0.000030 - momentum: 0.000000
143
+ 2023-10-25 21:34:04,755 epoch 5 - iter 203/292 - loss 0.03960139 - time (sec): 9.03 - samples/sec: 3365.46 - lr: 0.000030 - momentum: 0.000000
144
+ 2023-10-25 21:34:06,006 epoch 5 - iter 232/292 - loss 0.03978996 - time (sec): 10.28 - samples/sec: 3462.51 - lr: 0.000029 - momentum: 0.000000
145
+ 2023-10-25 21:34:07,231 epoch 5 - iter 261/292 - loss 0.04007028 - time (sec): 11.50 - samples/sec: 3481.69 - lr: 0.000028 - momentum: 0.000000
146
+ 2023-10-25 21:34:08,469 epoch 5 - iter 290/292 - loss 0.03905369 - time (sec): 12.74 - samples/sec: 3476.05 - lr: 0.000028 - momentum: 0.000000
147
+ 2023-10-25 21:34:08,549 ----------------------------------------------------------------------------------------------------
148
+ 2023-10-25 21:34:08,550 EPOCH 5 done: loss 0.0390 - lr: 0.000028
149
+ 2023-10-25 21:34:09,457 DEV : loss 0.16055038571357727 - f1-score (micro avg) 0.6835
150
+ 2023-10-25 21:34:09,462 ----------------------------------------------------------------------------------------------------
151
+ 2023-10-25 21:34:10,800 epoch 6 - iter 29/292 - loss 0.03141562 - time (sec): 1.34 - samples/sec: 3701.70 - lr: 0.000027 - momentum: 0.000000
152
+ 2023-10-25 21:34:12,095 epoch 6 - iter 58/292 - loss 0.03968166 - time (sec): 2.63 - samples/sec: 3404.87 - lr: 0.000027 - momentum: 0.000000
153
+ 2023-10-25 21:34:13,387 epoch 6 - iter 87/292 - loss 0.03196566 - time (sec): 3.92 - samples/sec: 3450.89 - lr: 0.000026 - momentum: 0.000000
154
+ 2023-10-25 21:34:14,722 epoch 6 - iter 116/292 - loss 0.03582229 - time (sec): 5.26 - samples/sec: 3465.57 - lr: 0.000026 - momentum: 0.000000
155
+ 2023-10-25 21:34:15,994 epoch 6 - iter 145/292 - loss 0.03410517 - time (sec): 6.53 - samples/sec: 3474.12 - lr: 0.000025 - momentum: 0.000000
156
+ 2023-10-25 21:34:17,304 epoch 6 - iter 174/292 - loss 0.03262458 - time (sec): 7.84 - samples/sec: 3455.30 - lr: 0.000025 - momentum: 0.000000
157
+ 2023-10-25 21:34:18,567 epoch 6 - iter 203/292 - loss 0.03056044 - time (sec): 9.10 - samples/sec: 3418.19 - lr: 0.000024 - momentum: 0.000000
158
+ 2023-10-25 21:34:19,863 epoch 6 - iter 232/292 - loss 0.03001793 - time (sec): 10.40 - samples/sec: 3390.09 - lr: 0.000023 - momentum: 0.000000
159
+ 2023-10-25 21:34:21,193 epoch 6 - iter 261/292 - loss 0.03043669 - time (sec): 11.73 - samples/sec: 3399.43 - lr: 0.000023 - momentum: 0.000000
160
+ 2023-10-25 21:34:22,499 epoch 6 - iter 290/292 - loss 0.03080981 - time (sec): 13.04 - samples/sec: 3371.72 - lr: 0.000022 - momentum: 0.000000
161
+ 2023-10-25 21:34:22,589 ----------------------------------------------------------------------------------------------------
162
+ 2023-10-25 21:34:22,589 EPOCH 6 done: loss 0.0306 - lr: 0.000022
163
+ 2023-10-25 21:34:23,501 DEV : loss 0.19378620386123657 - f1-score (micro avg) 0.7133
164
+ 2023-10-25 21:34:23,506 ----------------------------------------------------------------------------------------------------
165
+ 2023-10-25 21:34:24,816 epoch 7 - iter 29/292 - loss 0.02249619 - time (sec): 1.31 - samples/sec: 3713.89 - lr: 0.000022 - momentum: 0.000000
166
+ 2023-10-25 21:34:26,141 epoch 7 - iter 58/292 - loss 0.03051413 - time (sec): 2.63 - samples/sec: 3714.66 - lr: 0.000021 - momentum: 0.000000
167
+ 2023-10-25 21:34:27,389 epoch 7 - iter 87/292 - loss 0.03401934 - time (sec): 3.88 - samples/sec: 3562.14 - lr: 0.000021 - momentum: 0.000000
168
+ 2023-10-25 21:34:28,671 epoch 7 - iter 116/292 - loss 0.03101955 - time (sec): 5.16 - samples/sec: 3443.23 - lr: 0.000020 - momentum: 0.000000
169
+ 2023-10-25 21:34:29,951 epoch 7 - iter 145/292 - loss 0.02682969 - time (sec): 6.44 - samples/sec: 3368.08 - lr: 0.000020 - momentum: 0.000000
170
+ 2023-10-25 21:34:31,346 epoch 7 - iter 174/292 - loss 0.02528320 - time (sec): 7.84 - samples/sec: 3392.08 - lr: 0.000019 - momentum: 0.000000
171
+ 2023-10-25 21:34:32,684 epoch 7 - iter 203/292 - loss 0.02397947 - time (sec): 9.18 - samples/sec: 3398.50 - lr: 0.000018 - momentum: 0.000000
172
+ 2023-10-25 21:34:33,987 epoch 7 - iter 232/292 - loss 0.02354003 - time (sec): 10.48 - samples/sec: 3367.45 - lr: 0.000018 - momentum: 0.000000
173
+ 2023-10-25 21:34:35,297 epoch 7 - iter 261/292 - loss 0.02151238 - time (sec): 11.79 - samples/sec: 3361.83 - lr: 0.000017 - momentum: 0.000000
174
+ 2023-10-25 21:34:36,578 epoch 7 - iter 290/292 - loss 0.02098365 - time (sec): 13.07 - samples/sec: 3389.04 - lr: 0.000017 - momentum: 0.000000
175
+ 2023-10-25 21:34:36,653 ----------------------------------------------------------------------------------------------------
176
+ 2023-10-25 21:34:36,654 EPOCH 7 done: loss 0.0209 - lr: 0.000017
177
+ 2023-10-25 21:34:37,749 DEV : loss 0.1828424036502838 - f1-score (micro avg) 0.7832
178
+ 2023-10-25 21:34:37,754 saving best model
179
+ 2023-10-25 21:34:38,425 ----------------------------------------------------------------------------------------------------
180
+ 2023-10-25 21:34:39,867 epoch 8 - iter 29/292 - loss 0.02744473 - time (sec): 1.44 - samples/sec: 3042.64 - lr: 0.000016 - momentum: 0.000000
181
+ 2023-10-25 21:34:41,255 epoch 8 - iter 58/292 - loss 0.02406253 - time (sec): 2.83 - samples/sec: 3124.97 - lr: 0.000016 - momentum: 0.000000
182
+ 2023-10-25 21:34:42,541 epoch 8 - iter 87/292 - loss 0.01768261 - time (sec): 4.11 - samples/sec: 3273.21 - lr: 0.000015 - momentum: 0.000000
183
+ 2023-10-25 21:34:43,782 epoch 8 - iter 116/292 - loss 0.01689601 - time (sec): 5.35 - samples/sec: 3291.07 - lr: 0.000015 - momentum: 0.000000
184
+ 2023-10-25 21:34:45,038 epoch 8 - iter 145/292 - loss 0.01534873 - time (sec): 6.61 - samples/sec: 3295.91 - lr: 0.000014 - momentum: 0.000000
185
+ 2023-10-25 21:34:46,352 epoch 8 - iter 174/292 - loss 0.01693279 - time (sec): 7.92 - samples/sec: 3280.78 - lr: 0.000013 - momentum: 0.000000
186
+ 2023-10-25 21:34:47,614 epoch 8 - iter 203/292 - loss 0.01577629 - time (sec): 9.19 - samples/sec: 3233.29 - lr: 0.000013 - momentum: 0.000000
187
+ 2023-10-25 21:34:48,896 epoch 8 - iter 232/292 - loss 0.01567376 - time (sec): 10.47 - samples/sec: 3264.57 - lr: 0.000012 - momentum: 0.000000
188
+ 2023-10-25 21:34:50,171 epoch 8 - iter 261/292 - loss 0.01480935 - time (sec): 11.74 - samples/sec: 3318.43 - lr: 0.000012 - momentum: 0.000000
189
+ 2023-10-25 21:34:51,559 epoch 8 - iter 290/292 - loss 0.01450321 - time (sec): 13.13 - samples/sec: 3369.14 - lr: 0.000011 - momentum: 0.000000
190
+ 2023-10-25 21:34:51,643 ----------------------------------------------------------------------------------------------------
191
+ 2023-10-25 21:34:51,643 EPOCH 8 done: loss 0.0144 - lr: 0.000011
192
+ 2023-10-25 21:34:52,565 DEV : loss 0.2024029940366745 - f1-score (micro avg) 0.7134
193
+ 2023-10-25 21:34:52,569 ----------------------------------------------------------------------------------------------------
194
+ 2023-10-25 21:34:53,951 epoch 9 - iter 29/292 - loss 0.00639943 - time (sec): 1.38 - samples/sec: 3617.05 - lr: 0.000011 - momentum: 0.000000
195
+ 2023-10-25 21:34:55,179 epoch 9 - iter 58/292 - loss 0.00947452 - time (sec): 2.61 - samples/sec: 3552.62 - lr: 0.000010 - momentum: 0.000000
196
+ 2023-10-25 21:34:56,462 epoch 9 - iter 87/292 - loss 0.00782213 - time (sec): 3.89 - samples/sec: 3553.60 - lr: 0.000010 - momentum: 0.000000
197
+ 2023-10-25 21:34:57,797 epoch 9 - iter 116/292 - loss 0.01172703 - time (sec): 5.23 - samples/sec: 3543.78 - lr: 0.000009 - momentum: 0.000000
198
+ 2023-10-25 21:34:59,111 epoch 9 - iter 145/292 - loss 0.01086021 - time (sec): 6.54 - samples/sec: 3507.32 - lr: 0.000008 - momentum: 0.000000
199
+ 2023-10-25 21:35:00,406 epoch 9 - iter 174/292 - loss 0.01055746 - time (sec): 7.84 - samples/sec: 3482.07 - lr: 0.000008 - momentum: 0.000000
200
+ 2023-10-25 21:35:01,686 epoch 9 - iter 203/292 - loss 0.00948365 - time (sec): 9.12 - samples/sec: 3480.90 - lr: 0.000007 - momentum: 0.000000
201
+ 2023-10-25 21:35:02,926 epoch 9 - iter 232/292 - loss 0.00922094 - time (sec): 10.36 - samples/sec: 3441.15 - lr: 0.000007 - momentum: 0.000000
202
+ 2023-10-25 21:35:04,220 epoch 9 - iter 261/292 - loss 0.00941786 - time (sec): 11.65 - samples/sec: 3399.49 - lr: 0.000006 - momentum: 0.000000
203
+ 2023-10-25 21:35:05,542 epoch 9 - iter 290/292 - loss 0.00868448 - time (sec): 12.97 - samples/sec: 3404.17 - lr: 0.000006 - momentum: 0.000000
204
+ 2023-10-25 21:35:05,627 ----------------------------------------------------------------------------------------------------
205
+ 2023-10-25 21:35:05,627 EPOCH 9 done: loss 0.0086 - lr: 0.000006
206
+ 2023-10-25 21:35:06,545 DEV : loss 0.21211808919906616 - f1-score (micro avg) 0.7403
207
+ 2023-10-25 21:35:06,549 ----------------------------------------------------------------------------------------------------
208
+ 2023-10-25 21:35:07,830 epoch 10 - iter 29/292 - loss 0.00154113 - time (sec): 1.28 - samples/sec: 3413.30 - lr: 0.000005 - momentum: 0.000000
209
+ 2023-10-25 21:35:09,139 epoch 10 - iter 58/292 - loss 0.00087880 - time (sec): 2.59 - samples/sec: 3179.05 - lr: 0.000005 - momentum: 0.000000
210
+ 2023-10-25 21:35:10,428 epoch 10 - iter 87/292 - loss 0.00742925 - time (sec): 3.88 - samples/sec: 3189.66 - lr: 0.000004 - momentum: 0.000000
211
+ 2023-10-25 21:35:11,675 epoch 10 - iter 116/292 - loss 0.00728705 - time (sec): 5.12 - samples/sec: 3255.16 - lr: 0.000003 - momentum: 0.000000
212
+ 2023-10-25 21:35:13,053 epoch 10 - iter 145/292 - loss 0.00611792 - time (sec): 6.50 - samples/sec: 3311.10 - lr: 0.000003 - momentum: 0.000000
213
+ 2023-10-25 21:35:14,289 epoch 10 - iter 174/292 - loss 0.00635455 - time (sec): 7.74 - samples/sec: 3345.65 - lr: 0.000002 - momentum: 0.000000
214
+ 2023-10-25 21:35:15,631 epoch 10 - iter 203/292 - loss 0.00623399 - time (sec): 9.08 - samples/sec: 3401.46 - lr: 0.000002 - momentum: 0.000000
215
+ 2023-10-25 21:35:16,944 epoch 10 - iter 232/292 - loss 0.00631900 - time (sec): 10.39 - samples/sec: 3381.05 - lr: 0.000001 - momentum: 0.000000
216
+ 2023-10-25 21:35:18,296 epoch 10 - iter 261/292 - loss 0.00618350 - time (sec): 11.75 - samples/sec: 3378.37 - lr: 0.000001 - momentum: 0.000000
217
+ 2023-10-25 21:35:19,587 epoch 10 - iter 290/292 - loss 0.00631463 - time (sec): 13.04 - samples/sec: 3395.56 - lr: 0.000000 - momentum: 0.000000
218
+ 2023-10-25 21:35:19,663 ----------------------------------------------------------------------------------------------------
219
+ 2023-10-25 21:35:19,663 EPOCH 10 done: loss 0.0063 - lr: 0.000000
220
+ 2023-10-25 21:35:20,570 DEV : loss 0.21458660066127777 - f1-score (micro avg) 0.7179
221
+ 2023-10-25 21:35:21,093 ----------------------------------------------------------------------------------------------------
222
+ 2023-10-25 21:35:21,094 Loading model from best epoch ...
223
+ 2023-10-25 21:35:22,805 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
224
+ 2023-10-25 21:35:24,351
225
+ Results:
226
+ - F-score (micro) 0.7601
227
+ - F-score (macro) 0.6983
228
+ - Accuracy 0.6367
229
+
230
+ By class:
231
+ precision recall f1-score support
232
+
233
+ PER 0.8000 0.8391 0.8191 348
234
+ LOC 0.6709 0.8123 0.7348 261
235
+ ORG 0.5102 0.4808 0.4950 52
236
+ HumanProd 0.7619 0.7273 0.7442 22
237
+
238
+ micro avg 0.7257 0.7980 0.7601 683
239
+ macro avg 0.6857 0.7148 0.6983 683
240
+ weighted avg 0.7274 0.7980 0.7598 683
241
+
242
+ 2023-10-25 21:35:24,351 ----------------------------------------------------------------------------------------------------