Uploaded the model
Browse files- dev.tsv +0 -0
- final-model.pt +3 -0
- loss.tsv +11 -0
- test.tsv +0 -0
- training.log +499 -0
- weights.txt +0 -0
dev.tsv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
final-model.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:540573dc73c98f5b1fa42c789f4465e7f1b2c7f326d7461dfdeada7d2522644b
|
3 |
+
size 444998061
|
loss.tsv
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
|
2 |
+
1 02:16:47 4 0.0001 0.2182895945059601 0.0355144739151001 0.7661 0.8947 0.8254 0.722
|
3 |
+
2 03:26:13 4 0.0000 0.13729935181138425 0.015243684872984886 0.9007 0.926 0.9132 0.8548
|
4 |
+
3 04:35:09 4 0.0000 0.11197359439314927 0.016585879027843475 0.9119 0.9342 0.9229 0.8697
|
5 |
+
4 05:44:00 4 0.0000 0.09147635538963178 0.016923826187849045 0.9132 0.9296 0.9213 0.8708
|
6 |
+
5 06:52:15 4 0.0000 0.07495889990317275 0.017464155331254005 0.9377 0.9246 0.9311 0.8831
|
7 |
+
6 08:00:45 4 0.0000 0.061747689342078395 0.01982131227850914 0.9348 0.9369 0.9358 0.8909
|
8 |
+
7 09:09:20 4 0.0000 0.0519030773124998 0.02467426098883152 0.9395 0.9315 0.9355 0.892
|
9 |
+
8 10:18:00 4 0.0000 0.04503195115695853 0.02364770695567131 0.9306 0.9438 0.9371 0.8939
|
10 |
+
9 11:26:37 4 0.0000 0.040509963133028556 0.026182951405644417 0.9328 0.9394 0.9361 0.8942
|
11 |
+
10 12:34:45 4 0.0000 0.03798332489249556 0.027400659397244453 0.9349 0.9388 0.9368 0.8943
|
test.tsv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
training.log
ADDED
@@ -0,0 +1,499 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2022-02-05 01:08:47,419 ----------------------------------------------------------------------------------------------------
|
2 |
+
2022-02-05 01:08:47,461 Model: "SequenceTagger(
|
3 |
+
(embeddings): TransformerWordEmbeddings(
|
4 |
+
(model): RobertaModel(
|
5 |
+
(embeddings): RobertaEmbeddings(
|
6 |
+
(word_embeddings): Embedding(32768, 768, padding_idx=1)
|
7 |
+
(position_embeddings): Embedding(514, 768, padding_idx=1)
|
8 |
+
(token_type_embeddings): Embedding(1, 768)
|
9 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
10 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
11 |
+
)
|
12 |
+
(encoder): RobertaEncoder(
|
13 |
+
(layer): ModuleList(
|
14 |
+
(0): RobertaLayer(
|
15 |
+
(attention): RobertaAttention(
|
16 |
+
(self): RobertaSelfAttention(
|
17 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
18 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
19 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
20 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
21 |
+
)
|
22 |
+
(output): RobertaSelfOutput(
|
23 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
24 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
25 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
26 |
+
)
|
27 |
+
)
|
28 |
+
(intermediate): RobertaIntermediate(
|
29 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
30 |
+
)
|
31 |
+
(output): RobertaOutput(
|
32 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
33 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
34 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
35 |
+
)
|
36 |
+
)
|
37 |
+
(1): RobertaLayer(
|
38 |
+
(attention): RobertaAttention(
|
39 |
+
(self): RobertaSelfAttention(
|
40 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
41 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
42 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
43 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
44 |
+
)
|
45 |
+
(output): RobertaSelfOutput(
|
46 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
47 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
48 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
49 |
+
)
|
50 |
+
)
|
51 |
+
(intermediate): RobertaIntermediate(
|
52 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
53 |
+
)
|
54 |
+
(output): RobertaOutput(
|
55 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
56 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
57 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
58 |
+
)
|
59 |
+
)
|
60 |
+
(2): RobertaLayer(
|
61 |
+
(attention): RobertaAttention(
|
62 |
+
(self): RobertaSelfAttention(
|
63 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
64 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
65 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
66 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
67 |
+
)
|
68 |
+
(output): RobertaSelfOutput(
|
69 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
70 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
71 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
72 |
+
)
|
73 |
+
)
|
74 |
+
(intermediate): RobertaIntermediate(
|
75 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
76 |
+
)
|
77 |
+
(output): RobertaOutput(
|
78 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
79 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
80 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
81 |
+
)
|
82 |
+
)
|
83 |
+
(3): RobertaLayer(
|
84 |
+
(attention): RobertaAttention(
|
85 |
+
(self): RobertaSelfAttention(
|
86 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
87 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
88 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
89 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
90 |
+
)
|
91 |
+
(output): RobertaSelfOutput(
|
92 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
93 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
94 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
95 |
+
)
|
96 |
+
)
|
97 |
+
(intermediate): RobertaIntermediate(
|
98 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
99 |
+
)
|
100 |
+
(output): RobertaOutput(
|
101 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
102 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
103 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
104 |
+
)
|
105 |
+
)
|
106 |
+
(4): RobertaLayer(
|
107 |
+
(attention): RobertaAttention(
|
108 |
+
(self): RobertaSelfAttention(
|
109 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
110 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
111 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
112 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
113 |
+
)
|
114 |
+
(output): RobertaSelfOutput(
|
115 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
116 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
117 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
118 |
+
)
|
119 |
+
)
|
120 |
+
(intermediate): RobertaIntermediate(
|
121 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
122 |
+
)
|
123 |
+
(output): RobertaOutput(
|
124 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
125 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
126 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
127 |
+
)
|
128 |
+
)
|
129 |
+
(5): RobertaLayer(
|
130 |
+
(attention): RobertaAttention(
|
131 |
+
(self): RobertaSelfAttention(
|
132 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
133 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
134 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
135 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
136 |
+
)
|
137 |
+
(output): RobertaSelfOutput(
|
138 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
139 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
140 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
141 |
+
)
|
142 |
+
)
|
143 |
+
(intermediate): RobertaIntermediate(
|
144 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
145 |
+
)
|
146 |
+
(output): RobertaOutput(
|
147 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
148 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
149 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
150 |
+
)
|
151 |
+
)
|
152 |
+
(6): RobertaLayer(
|
153 |
+
(attention): RobertaAttention(
|
154 |
+
(self): RobertaSelfAttention(
|
155 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
156 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
157 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
158 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
159 |
+
)
|
160 |
+
(output): RobertaSelfOutput(
|
161 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
162 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
163 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
164 |
+
)
|
165 |
+
)
|
166 |
+
(intermediate): RobertaIntermediate(
|
167 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
168 |
+
)
|
169 |
+
(output): RobertaOutput(
|
170 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
171 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
172 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
173 |
+
)
|
174 |
+
)
|
175 |
+
(7): RobertaLayer(
|
176 |
+
(attention): RobertaAttention(
|
177 |
+
(self): RobertaSelfAttention(
|
178 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
179 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
180 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
181 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
182 |
+
)
|
183 |
+
(output): RobertaSelfOutput(
|
184 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
185 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
186 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
187 |
+
)
|
188 |
+
)
|
189 |
+
(intermediate): RobertaIntermediate(
|
190 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
191 |
+
)
|
192 |
+
(output): RobertaOutput(
|
193 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
194 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
195 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
196 |
+
)
|
197 |
+
)
|
198 |
+
(8): RobertaLayer(
|
199 |
+
(attention): RobertaAttention(
|
200 |
+
(self): RobertaSelfAttention(
|
201 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
202 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
203 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
204 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
205 |
+
)
|
206 |
+
(output): RobertaSelfOutput(
|
207 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
208 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
209 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
210 |
+
)
|
211 |
+
)
|
212 |
+
(intermediate): RobertaIntermediate(
|
213 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
214 |
+
)
|
215 |
+
(output): RobertaOutput(
|
216 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
217 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
218 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
219 |
+
)
|
220 |
+
)
|
221 |
+
(9): RobertaLayer(
|
222 |
+
(attention): RobertaAttention(
|
223 |
+
(self): RobertaSelfAttention(
|
224 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
225 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
226 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
227 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
228 |
+
)
|
229 |
+
(output): RobertaSelfOutput(
|
230 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
231 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
232 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
233 |
+
)
|
234 |
+
)
|
235 |
+
(intermediate): RobertaIntermediate(
|
236 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
237 |
+
)
|
238 |
+
(output): RobertaOutput(
|
239 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
240 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
241 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
242 |
+
)
|
243 |
+
)
|
244 |
+
(10): RobertaLayer(
|
245 |
+
(attention): RobertaAttention(
|
246 |
+
(self): RobertaSelfAttention(
|
247 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
248 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
249 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
250 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
251 |
+
)
|
252 |
+
(output): RobertaSelfOutput(
|
253 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
254 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
255 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
256 |
+
)
|
257 |
+
)
|
258 |
+
(intermediate): RobertaIntermediate(
|
259 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
260 |
+
)
|
261 |
+
(output): RobertaOutput(
|
262 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
263 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
264 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
265 |
+
)
|
266 |
+
)
|
267 |
+
(11): RobertaLayer(
|
268 |
+
(attention): RobertaAttention(
|
269 |
+
(self): RobertaSelfAttention(
|
270 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
271 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
272 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
273 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
274 |
+
)
|
275 |
+
(output): RobertaSelfOutput(
|
276 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
277 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
278 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
279 |
+
)
|
280 |
+
)
|
281 |
+
(intermediate): RobertaIntermediate(
|
282 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
283 |
+
)
|
284 |
+
(output): RobertaOutput(
|
285 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
286 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
287 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
288 |
+
)
|
289 |
+
)
|
290 |
+
)
|
291 |
+
)
|
292 |
+
(pooler): RobertaPooler(
|
293 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
294 |
+
(activation): Tanh()
|
295 |
+
)
|
296 |
+
)
|
297 |
+
)
|
298 |
+
(word_dropout): WordDropout(p=0.05)
|
299 |
+
(locked_dropout): LockedDropout(p=0.5)
|
300 |
+
(linear): Linear(in_features=768, out_features=18, bias=True)
|
301 |
+
(beta): 1.0
|
302 |
+
(weights): None
|
303 |
+
(weight_tensor) None
|
304 |
+
)"
|
305 |
+
2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
|
306 |
+
2022-02-05 01:08:47,466 Corpus: "Corpus: 126973 train + 7037 dev + 7090 test sentences"
|
307 |
+
2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
|
308 |
+
2022-02-05 01:08:47,466 Parameters:
|
309 |
+
2022-02-05 01:08:47,466 - learning_rate: "5e-05"
|
310 |
+
2022-02-05 01:08:47,466 - mini_batch_size: "16"
|
311 |
+
2022-02-05 01:08:47,466 - patience: "3"
|
312 |
+
2022-02-05 01:08:47,466 - anneal_factor: "0.5"
|
313 |
+
2022-02-05 01:08:47,466 - max_epochs: "10"
|
314 |
+
2022-02-05 01:08:47,466 - shuffle: "True"
|
315 |
+
2022-02-05 01:08:47,466 - train_with_dev: "False"
|
316 |
+
2022-02-05 01:08:47,466 - batch_growth_annealing: "False"
|
317 |
+
2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
|
318 |
+
2022-02-05 01:08:47,466 Model training base path: "resources/taggers/ner-dalembert-2ndtry"
|
319 |
+
2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
|
320 |
+
2022-02-05 01:08:47,466 Device: cuda:0
|
321 |
+
2022-02-05 01:08:47,466 ----------------------------------------------------------------------------------------------------
|
322 |
+
2022-02-05 01:08:47,467 Embeddings storage mode: none
|
323 |
+
2022-02-05 01:08:47,469 ----------------------------------------------------------------------------------------------------
|
324 |
+
2022-02-05 01:15:08,771 epoch 1 - iter 793/7936 - loss 0.78007372 - samples/sec: 33.28 - lr: 0.000005
|
325 |
+
2022-02-05 01:22:45,940 epoch 1 - iter 1586/7936 - loss 0.41932043 - samples/sec: 27.76 - lr: 0.000010
|
326 |
+
2022-02-05 01:29:23,897 epoch 1 - iter 2379/7936 - loss 0.33514542 - samples/sec: 31.89 - lr: 0.000015
|
327 |
+
2022-02-05 01:35:24,915 epoch 1 - iter 3172/7936 - loss 0.30212998 - samples/sec: 35.15 - lr: 0.000020
|
328 |
+
2022-02-05 01:42:28,297 epoch 1 - iter 3965/7936 - loss 0.27341208 - samples/sec: 29.97 - lr: 0.000025
|
329 |
+
2022-02-05 01:49:23,543 epoch 1 - iter 4758/7936 - loss 0.25403588 - samples/sec: 30.56 - lr: 0.000030
|
330 |
+
2022-02-05 01:55:46,783 epoch 1 - iter 5551/7936 - loss 0.24241496 - samples/sec: 33.11 - lr: 0.000035
|
331 |
+
2022-02-05 02:01:45,654 epoch 1 - iter 6344/7936 - loss 0.23381719 - samples/sec: 35.36 - lr: 0.000040
|
332 |
+
2022-02-05 02:07:29,407 epoch 1 - iter 7137/7936 - loss 0.22586308 - samples/sec: 36.92 - lr: 0.000045
|
333 |
+
2022-02-05 02:13:54,603 epoch 1 - iter 7930/7936 - loss 0.21834611 - samples/sec: 32.94 - lr: 0.000050
|
334 |
+
2022-02-05 02:13:57,692 ----------------------------------------------------------------------------------------------------
|
335 |
+
2022-02-05 02:13:57,693 EPOCH 1 done: loss 0.2183 - lr 0.0000500
|
336 |
+
2022-02-05 02:16:47,190 DEV : loss 0.0355144739151001 - f1-score (micro avg) 0.8254
|
337 |
+
2022-02-05 02:16:47,244 BAD EPOCHS (no improvement): 4
|
338 |
+
2022-02-05 02:16:47,244 ----------------------------------------------------------------------------------------------------
|
339 |
+
2022-02-05 02:23:15,435 epoch 2 - iter 793/7936 - loss 0.14903310 - samples/sec: 32.69 - lr: 0.000049
|
340 |
+
2022-02-05 02:30:06,605 epoch 2 - iter 1586/7936 - loss 0.14777394 - samples/sec: 30.86 - lr: 0.000049
|
341 |
+
2022-02-05 02:36:48,570 epoch 2 - iter 2379/7936 - loss 0.14637300 - samples/sec: 31.57 - lr: 0.000048
|
342 |
+
2022-02-05 02:43:37,172 epoch 2 - iter 3172/7936 - loss 0.14491485 - samples/sec: 31.06 - lr: 0.000048
|
343 |
+
2022-02-05 02:50:13,040 epoch 2 - iter 3965/7936 - loss 0.14361996 - samples/sec: 32.06 - lr: 0.000047
|
344 |
+
2022-02-05 02:56:49,904 epoch 2 - iter 4758/7936 - loss 0.14232123 - samples/sec: 31.98 - lr: 0.000047
|
345 |
+
2022-02-05 03:03:34,383 epoch 2 - iter 5551/7936 - loss 0.14116820 - samples/sec: 31.38 - lr: 0.000046
|
346 |
+
2022-02-05 03:10:09,778 epoch 2 - iter 6344/7936 - loss 0.14001072 - samples/sec: 32.10 - lr: 0.000046
|
347 |
+
2022-02-05 03:16:43,847 epoch 2 - iter 7137/7936 - loss 0.13868572 - samples/sec: 32.20 - lr: 0.000045
|
348 |
+
2022-02-05 03:23:28,994 epoch 2 - iter 7930/7936 - loss 0.13731517 - samples/sec: 31.33 - lr: 0.000044
|
349 |
+
2022-02-05 03:23:31,622 ----------------------------------------------------------------------------------------------------
|
350 |
+
2022-02-05 03:23:31,623 EPOCH 2 done: loss 0.1373 - lr 0.0000444
|
351 |
+
2022-02-05 03:26:13,727 DEV : loss 0.015243684872984886 - f1-score (micro avg) 0.9132
|
352 |
+
2022-02-05 03:26:13,788 BAD EPOCHS (no improvement): 4
|
353 |
+
2022-02-05 03:26:13,806 ----------------------------------------------------------------------------------------------------
|
354 |
+
2022-02-05 03:32:57,765 epoch 3 - iter 793/7936 - loss 0.11924788 - samples/sec: 31.42 - lr: 0.000044
|
355 |
+
2022-02-05 03:39:33,229 epoch 3 - iter 1586/7936 - loss 0.11867811 - samples/sec: 32.09 - lr: 0.000043
|
356 |
+
2022-02-05 03:46:09,619 epoch 3 - iter 2379/7936 - loss 0.11819415 - samples/sec: 32.01 - lr: 0.000043
|
357 |
+
2022-02-05 03:52:49,510 epoch 3 - iter 3172/7936 - loss 0.11779082 - samples/sec: 31.74 - lr: 0.000042
|
358 |
+
2022-02-05 03:59:27,917 epoch 3 - iter 3965/7936 - loss 0.11691604 - samples/sec: 31.85 - lr: 0.000042
|
359 |
+
2022-02-05 04:06:01,365 epoch 3 - iter 4758/7936 - loss 0.11592267 - samples/sec: 32.26 - lr: 0.000041
|
360 |
+
2022-02-05 04:12:41,174 epoch 3 - iter 5551/7936 - loss 0.11480043 - samples/sec: 31.74 - lr: 0.000041
|
361 |
+
2022-02-05 04:19:14,243 epoch 3 - iter 6344/7936 - loss 0.11389582 - samples/sec: 32.29 - lr: 0.000040
|
362 |
+
2022-02-05 04:25:45,192 epoch 3 - iter 7137/7936 - loss 0.11289267 - samples/sec: 32.46 - lr: 0.000039
|
363 |
+
2022-02-05 04:32:26,310 epoch 3 - iter 7930/7936 - loss 0.11196899 - samples/sec: 31.64 - lr: 0.000039
|
364 |
+
2022-02-05 04:32:29,352 ----------------------------------------------------------------------------------------------------
|
365 |
+
2022-02-05 04:32:29,353 EPOCH 3 done: loss 0.1120 - lr 0.0000389
|
366 |
+
2022-02-05 04:35:09,639 DEV : loss 0.016585879027843475 - f1-score (micro avg) 0.9229
|
367 |
+
2022-02-05 04:35:09,698 BAD EPOCHS (no improvement): 4
|
368 |
+
2022-02-05 04:35:09,698 ----------------------------------------------------------------------------------------------------
|
369 |
+
2022-02-05 04:41:46,821 epoch 4 - iter 793/7936 - loss 0.09739851 - samples/sec: 31.96 - lr: 0.000038
|
370 |
+
2022-02-05 04:48:23,504 epoch 4 - iter 1586/7936 - loss 0.09750632 - samples/sec: 31.99 - lr: 0.000038
|
371 |
+
2022-02-05 04:55:05,833 epoch 4 - iter 2379/7936 - loss 0.09636659 - samples/sec: 31.54 - lr: 0.000037
|
372 |
+
2022-02-05 05:01:34,951 epoch 4 - iter 3172/7936 - loss 0.09583742 - samples/sec: 32.61 - lr: 0.000037
|
373 |
+
2022-02-05 05:08:07,163 epoch 4 - iter 3965/7936 - loss 0.09518243 - samples/sec: 32.36 - lr: 0.000036
|
374 |
+
2022-02-05 05:14:50,781 epoch 4 - iter 4758/7936 - loss 0.09444265 - samples/sec: 31.44 - lr: 0.000036
|
375 |
+
2022-02-05 05:21:24,983 epoch 4 - iter 5551/7936 - loss 0.09374740 - samples/sec: 32.19 - lr: 0.000035
|
376 |
+
2022-02-05 05:27:54,052 epoch 4 - iter 6344/7936 - loss 0.09321236 - samples/sec: 32.62 - lr: 0.000034
|
377 |
+
2022-02-05 05:34:32,228 epoch 4 - iter 7137/7936 - loss 0.09231997 - samples/sec: 31.87 - lr: 0.000034
|
378 |
+
2022-02-05 05:41:08,580 epoch 4 - iter 7930/7936 - loss 0.09147929 - samples/sec: 32.02 - lr: 0.000033
|
379 |
+
2022-02-05 05:41:11,479 ----------------------------------------------------------------------------------------------------
|
380 |
+
2022-02-05 05:41:11,479 EPOCH 4 done: loss 0.0915 - lr 0.0000333
|
381 |
+
2022-02-05 05:44:00,197 DEV : loss 0.016923826187849045 - f1-score (micro avg) 0.9213
|
382 |
+
2022-02-05 05:44:00,256 BAD EPOCHS (no improvement): 4
|
383 |
+
2022-02-05 05:44:00,270 ----------------------------------------------------------------------------------------------------
|
384 |
+
2022-02-05 05:50:27,537 epoch 5 - iter 793/7936 - loss 0.07986125 - samples/sec: 32.77 - lr: 0.000033
|
385 |
+
2022-02-05 05:56:56,203 epoch 5 - iter 1586/7936 - loss 0.08031745 - samples/sec: 32.65 - lr: 0.000032
|
386 |
+
2022-02-05 06:03:34,109 epoch 5 - iter 2379/7936 - loss 0.07984185 - samples/sec: 31.89 - lr: 0.000032
|
387 |
+
2022-02-05 06:10:03,550 epoch 5 - iter 3172/7936 - loss 0.07905074 - samples/sec: 32.59 - lr: 0.000031
|
388 |
+
2022-02-05 06:16:30,085 epoch 5 - iter 3965/7936 - loss 0.07843193 - samples/sec: 32.83 - lr: 0.000031
|
389 |
+
2022-02-05 06:23:10,671 epoch 5 - iter 4758/7936 - loss 0.07785540 - samples/sec: 31.68 - lr: 0.000030
|
390 |
+
2022-02-05 06:29:45,063 epoch 5 - iter 5551/7936 - loss 0.07709413 - samples/sec: 32.18 - lr: 0.000029
|
391 |
+
2022-02-05 06:36:23,513 epoch 5 - iter 6344/7936 - loss 0.07634510 - samples/sec: 31.85 - lr: 0.000029
|
392 |
+
2022-02-05 06:42:51,615 epoch 5 - iter 7137/7936 - loss 0.07566508 - samples/sec: 32.70 - lr: 0.000028
|
393 |
+
2022-02-05 06:49:23,409 epoch 5 - iter 7930/7936 - loss 0.07495508 - samples/sec: 32.39 - lr: 0.000028
|
394 |
+
2022-02-05 06:49:26,372 ----------------------------------------------------------------------------------------------------
|
395 |
+
2022-02-05 06:49:26,373 EPOCH 5 done: loss 0.0750 - lr 0.0000278
|
396 |
+
2022-02-05 06:52:15,459 DEV : loss 0.017464155331254005 - f1-score (micro avg) 0.9311
|
397 |
+
2022-02-05 06:52:15,518 BAD EPOCHS (no improvement): 4
|
398 |
+
2022-02-05 06:52:15,518 ----------------------------------------------------------------------------------------------------
|
399 |
+
2022-02-05 06:58:49,072 epoch 6 - iter 793/7936 - loss 0.06552824 - samples/sec: 32.25 - lr: 0.000027
|
400 |
+
2022-02-05 07:05:27,796 epoch 6 - iter 1586/7936 - loss 0.06569517 - samples/sec: 31.83 - lr: 0.000027
|
401 |
+
2022-02-05 07:11:58,162 epoch 6 - iter 2379/7936 - loss 0.06536467 - samples/sec: 32.51 - lr: 0.000026
|
402 |
+
2022-02-05 07:18:25,878 epoch 6 - iter 3172/7936 - loss 0.06467146 - samples/sec: 32.73 - lr: 0.000026
|
403 |
+
2022-02-05 07:25:10,562 epoch 6 - iter 3965/7936 - loss 0.06426965 - samples/sec: 31.36 - lr: 0.000025
|
404 |
+
2022-02-05 07:31:39,437 epoch 6 - iter 4758/7936 - loss 0.06371305 - samples/sec: 32.63 - lr: 0.000024
|
405 |
+
2022-02-05 07:38:08,323 epoch 6 - iter 5551/7936 - loss 0.06328229 - samples/sec: 32.63 - lr: 0.000024
|
406 |
+
2022-02-05 07:44:52,176 epoch 6 - iter 6344/7936 - loss 0.06272143 - samples/sec: 31.42 - lr: 0.000023
|
407 |
+
2022-02-05 07:51:20,507 epoch 6 - iter 7137/7936 - loss 0.06218937 - samples/sec: 32.68 - lr: 0.000023
|
408 |
+
2022-02-05 07:57:52,828 epoch 6 - iter 7930/7936 - loss 0.06175113 - samples/sec: 32.35 - lr: 0.000022
|
409 |
+
2022-02-05 07:57:55,686 ----------------------------------------------------------------------------------------------------
|
410 |
+
2022-02-05 07:57:55,687 EPOCH 6 done: loss 0.0617 - lr 0.0000222
|
411 |
+
2022-02-05 08:00:45,565 DEV : loss 0.01982131227850914 - f1-score (micro avg) 0.9358
|
412 |
+
2022-02-05 08:00:45,625 BAD EPOCHS (no improvement): 4
|
413 |
+
2022-02-05 08:00:45,644 ----------------------------------------------------------------------------------------------------
|
414 |
+
2022-02-05 08:07:26,967 epoch 7 - iter 793/7936 - loss 0.05520420 - samples/sec: 31.62 - lr: 0.000022
|
415 |
+
2022-02-05 08:13:58,782 epoch 7 - iter 1586/7936 - loss 0.05522964 - samples/sec: 32.39 - lr: 0.000021
|
416 |
+
2022-02-05 08:20:32,705 epoch 7 - iter 2379/7936 - loss 0.05482898 - samples/sec: 32.21 - lr: 0.000021
|
417 |
+
2022-02-05 08:27:14,353 epoch 7 - iter 3172/7936 - loss 0.05433105 - samples/sec: 31.59 - lr: 0.000020
|
418 |
+
2022-02-05 08:33:45,236 epoch 7 - iter 3965/7936 - loss 0.05397125 - samples/sec: 32.47 - lr: 0.000019
|
419 |
+
2022-02-05 08:40:14,072 epoch 7 - iter 4758/7936 - loss 0.05348281 - samples/sec: 32.64 - lr: 0.000019
|
420 |
+
2022-02-05 08:46:52,674 epoch 7 - iter 5551/7936 - loss 0.05316673 - samples/sec: 31.84 - lr: 0.000018
|
421 |
+
2022-02-05 08:53:20,653 epoch 7 - iter 6344/7936 - loss 0.05275831 - samples/sec: 32.71 - lr: 0.000018
|
422 |
+
2022-02-05 08:59:52,741 epoch 7 - iter 7137/7936 - loss 0.05230036 - samples/sec: 32.37 - lr: 0.000017
|
423 |
+
2022-02-05 09:06:38,983 epoch 7 - iter 7930/7936 - loss 0.05190552 - samples/sec: 31.24 - lr: 0.000017
|
424 |
+
2022-02-05 09:06:41,639 ----------------------------------------------------------------------------------------------------
|
425 |
+
2022-02-05 09:06:41,639 EPOCH 7 done: loss 0.0519 - lr 0.0000167
|
426 |
+
2022-02-05 09:09:20,864 DEV : loss 0.02467426098883152 - f1-score (micro avg) 0.9355
|
427 |
+
2022-02-05 09:09:20,924 BAD EPOCHS (no improvement): 4
|
428 |
+
2022-02-05 09:09:20,939 ----------------------------------------------------------------------------------------------------
|
429 |
+
2022-02-05 09:16:05,134 epoch 8 - iter 793/7936 - loss 0.04726178 - samples/sec: 31.40 - lr: 0.000016
|
430 |
+
2022-02-05 09:22:33,870 epoch 8 - iter 1586/7936 - loss 0.04719666 - samples/sec: 32.64 - lr: 0.000016
|
431 |
+
2022-02-05 09:29:02,929 epoch 8 - iter 2379/7936 - loss 0.04663752 - samples/sec: 32.62 - lr: 0.000015
|
432 |
+
2022-02-05 09:35:42,369 epoch 8 - iter 3172/7936 - loss 0.04634901 - samples/sec: 31.77 - lr: 0.000014
|
433 |
+
2022-02-05 09:42:14,843 epoch 8 - iter 3965/7936 - loss 0.04602895 - samples/sec: 32.33 - lr: 0.000014
|
434 |
+
2022-02-05 09:48:48,062 epoch 8 - iter 4758/7936 - loss 0.04582764 - samples/sec: 32.27 - lr: 0.000013
|
435 |
+
2022-02-05 09:55:28,863 epoch 8 - iter 5551/7936 - loss 0.04566599 - samples/sec: 31.66 - lr: 0.000013
|
436 |
+
2022-02-05 10:01:52,699 epoch 8 - iter 6344/7936 - loss 0.04545939 - samples/sec: 33.06 - lr: 0.000012
|
437 |
+
2022-02-05 10:08:33,137 epoch 8 - iter 7137/7936 - loss 0.04526206 - samples/sec: 31.69 - lr: 0.000012
|
438 |
+
2022-02-05 10:15:07,241 epoch 8 - iter 7930/7936 - loss 0.04503385 - samples/sec: 32.20 - lr: 0.000011
|
439 |
+
2022-02-05 10:15:10,600 ----------------------------------------------------------------------------------------------------
|
440 |
+
2022-02-05 10:15:10,600 EPOCH 8 done: loss 0.0450 - lr 0.0000111
|
441 |
+
2022-02-05 10:18:00,280 DEV : loss 0.02364770695567131 - f1-score (micro avg) 0.9371
|
442 |
+
2022-02-05 10:18:00,339 BAD EPOCHS (no improvement): 4
|
443 |
+
2022-02-05 10:18:00,358 ----------------------------------------------------------------------------------------------------
|
444 |
+
2022-02-05 10:24:31,011 epoch 9 - iter 793/7936 - loss 0.04122325 - samples/sec: 32.48 - lr: 0.000011
|
445 |
+
2022-02-05 10:31:00,279 epoch 9 - iter 1586/7936 - loss 0.04130931 - samples/sec: 32.60 - lr: 0.000010
|
446 |
+
2022-02-05 10:37:40,369 epoch 9 - iter 2379/7936 - loss 0.04131112 - samples/sec: 31.72 - lr: 0.000009
|
447 |
+
2022-02-05 10:44:11,067 epoch 9 - iter 3172/7936 - loss 0.04141124 - samples/sec: 32.48 - lr: 0.000009
|
448 |
+
2022-02-05 10:50:41,270 epoch 9 - iter 3965/7936 - loss 0.04120608 - samples/sec: 32.52 - lr: 0.000008
|
449 |
+
2022-02-05 10:57:24,718 epoch 9 - iter 4758/7936 - loss 0.04108655 - samples/sec: 31.45 - lr: 0.000008
|
450 |
+
2022-02-05 11:04:00,581 epoch 9 - iter 5551/7936 - loss 0.04093370 - samples/sec: 32.06 - lr: 0.000007
|
451 |
+
2022-02-05 11:10:31,042 epoch 9 - iter 6344/7936 - loss 0.04078404 - samples/sec: 32.50 - lr: 0.000007
|
452 |
+
2022-02-05 11:17:13,751 epoch 9 - iter 7137/7936 - loss 0.04061073 - samples/sec: 31.51 - lr: 0.000006
|
453 |
+
2022-02-05 11:23:44,231 epoch 9 - iter 7930/7936 - loss 0.04050638 - samples/sec: 32.50 - lr: 0.000006
|
454 |
+
2022-02-05 11:23:47,941 ----------------------------------------------------------------------------------------------------
|
455 |
+
2022-02-05 11:23:47,942 EPOCH 9 done: loss 0.0405 - lr 0.0000056
|
456 |
+
2022-02-05 11:26:37,114 DEV : loss 0.026182951405644417 - f1-score (micro avg) 0.9361
|
457 |
+
2022-02-05 11:26:37,173 BAD EPOCHS (no improvement): 4
|
458 |
+
2022-02-05 11:26:37,186 ----------------------------------------------------------------------------------------------------
|
459 |
+
2022-02-05 11:33:05,778 epoch 10 - iter 793/7936 - loss 0.03876526 - samples/sec: 32.66 - lr: 0.000005
|
460 |
+
2022-02-05 11:39:45,501 epoch 10 - iter 1586/7936 - loss 0.03871561 - samples/sec: 31.75 - lr: 0.000004
|
461 |
+
2022-02-05 11:46:18,242 epoch 10 - iter 2379/7936 - loss 0.03842790 - samples/sec: 32.31 - lr: 0.000004
|
462 |
+
2022-02-05 11:52:48,370 epoch 10 - iter 3172/7936 - loss 0.03820246 - samples/sec: 32.53 - lr: 0.000003
|
463 |
+
2022-02-05 11:59:28,420 epoch 10 - iter 3965/7936 - loss 0.03807900 - samples/sec: 31.72 - lr: 0.000003
|
464 |
+
2022-02-05 12:05:57,882 epoch 10 - iter 4758/7936 - loss 0.03798954 - samples/sec: 32.58 - lr: 0.000002
|
465 |
+
2022-02-05 12:12:25,766 epoch 10 - iter 5551/7936 - loss 0.03803371 - samples/sec: 32.72 - lr: 0.000002
|
466 |
+
2022-02-05 12:19:03,411 epoch 10 - iter 6344/7936 - loss 0.03805844 - samples/sec: 31.91 - lr: 0.000001
|
467 |
+
2022-02-05 12:25:27,539 epoch 10 - iter 7137/7936 - loss 0.03799490 - samples/sec: 33.04 - lr: 0.000001
|
468 |
+
2022-02-05 12:31:55,442 epoch 10 - iter 7930/7936 - loss 0.03798541 - samples/sec: 32.71 - lr: 0.000000
|
469 |
+
2022-02-05 12:31:58,461 ----------------------------------------------------------------------------------------------------
|
470 |
+
2022-02-05 12:31:58,462 EPOCH 10 done: loss 0.0380 - lr 0.0000000
|
471 |
+
2022-02-05 12:34:45,700 DEV : loss 0.027400659397244453 - f1-score (micro avg) 0.9368
|
472 |
+
2022-02-05 12:34:45,760 BAD EPOCHS (no improvement): 4
|
473 |
+
2022-02-05 12:34:46,755 ----------------------------------------------------------------------------------------------------
|
474 |
+
2022-02-05 12:34:46,757 Testing using last state of model ...
|
475 |
+
2022-02-05 12:37:34,421 0.9329 0.9323 0.9326 0.8893
|
476 |
+
2022-02-05 12:37:34,422
|
477 |
+
Results:
|
478 |
+
- F-score (micro) 0.9326
|
479 |
+
- F-score (macro) 0.9111
|
480 |
+
- Accuracy 0.8893
|
481 |
+
|
482 |
+
By class:
|
483 |
+
precision recall f1-score support
|
484 |
+
|
485 |
+
pers 0.9355 0.9279 0.9317 2734
|
486 |
+
loc 0.9242 0.9335 0.9288 1384
|
487 |
+
amount 0.9800 0.9800 0.9800 250
|
488 |
+
time 0.9456 0.9576 0.9516 236
|
489 |
+
func 0.9333 0.9000 0.9164 140
|
490 |
+
org 0.8148 0.8980 0.8544 49
|
491 |
+
prod 0.8621 0.9259 0.8929 27
|
492 |
+
event 0.8333 0.8333 0.8333 12
|
493 |
+
|
494 |
+
micro avg 0.9329 0.9323 0.9326 4832
|
495 |
+
macro avg 0.9036 0.9195 0.9111 4832
|
496 |
+
weighted avg 0.9331 0.9323 0.9327 4832
|
497 |
+
samples avg 0.8893 0.8893 0.8893 4832
|
498 |
+
|
499 |
+
2022-02-05 12:37:34,422 ----------------------------------------------------------------------------------------------------
|
weights.txt
ADDED
File without changes
|