File size: 26,486 Bytes
962439a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2164244
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
| end of split   1 /113 | epoch   1 | time: 224.45s | valid loss 7.6183 | valid ppl 2035.0861 | learning rate 5.0000
| end of split   2 /113 | epoch   1 | time: 229.45s | valid loss 7.3864 | valid ppl 1613.9065 | learning rate 5.0000
| end of split   3 /113 | epoch   1 | time: 239.40s | valid loss 7.3424 | valid ppl 1544.3504 | learning rate 5.0000
| end of split   4 /113 | epoch   1 | time: 233.67s | valid loss 7.2568 | valid ppl 1417.6838 | learning rate 5.0000
| end of split   5 /113 | epoch   1 | time: 227.57s | valid loss 7.2848 | valid ppl 1458.0133 | learning rate 5.0000
| end of split   6 /113 | epoch   1 | time: 235.49s | valid loss 7.2458 | valid ppl 1402.2080 | learning rate 5.0000
| end of split   7 /113 | epoch   1 | time: 235.14s | valid loss 7.2137 | valid ppl 1357.8841 | learning rate 5.0000
| end of split   8 /113 | epoch   1 | time: 238.90s | valid loss 7.1989 | valid ppl 1337.9002 | learning rate 5.0000
| end of split   9 /113 | epoch   1 | time: 228.81s | valid loss 7.1782 | valid ppl 1310.5202 | learning rate 5.0000
| end of split  10 /113 | epoch   1 | time: 230.95s | valid loss 7.1692 | valid ppl 1298.8697 | learning rate 5.0000
| end of split  11 /113 | epoch   1 | time: 231.70s | valid loss 7.1442 | valid ppl 1266.7305 | learning rate 5.0000
| end of split  12 /113 | epoch   1 | time: 240.42s | valid loss 7.1839 | valid ppl 1317.9954 | learning rate 5.0000
| end of split  13 /113 | epoch   1 | time: 235.25s | valid loss 7.2127 | valid ppl 1356.5282 | learning rate 5.0000
| end of split  14 /113 | epoch   1 | time: 232.67s | valid loss 7.2704 | valid ppl 1437.1488 | learning rate 5.0000
| end of split  15 /113 | epoch   1 | time: 229.99s | valid loss 7.1410 | valid ppl 1262.7434 | learning rate 5.0000
| end of split  16 /113 | epoch   1 | time: 230.24s | valid loss 7.2028 | valid ppl 1343.1933 | learning rate 5.0000
| end of split  17 /113 | epoch   1 | time: 48.80s | valid loss 7.1864 | valid ppl 1321.2975 | learning rate 5.0000
| end of split  18 /113 | epoch   1 | time: 238.71s | valid loss 7.1344 | valid ppl 1254.4124 | learning rate 5.0000
| end of split  19 /113 | epoch   1 | time: 238.74s | valid loss 7.1402 | valid ppl 1261.6803 | learning rate 5.0000
| end of split  20 /113 | epoch   1 | time: 230.88s | valid loss 7.2222 | valid ppl 1369.5573 | learning rate 5.0000
| end of split  21 /113 | epoch   1 | time: 235.01s | valid loss 7.1024 | valid ppl 1214.8458 | learning rate 5.0000
| end of split  22 /113 | epoch   1 | time: 233.22s | valid loss 7.1523 | valid ppl 1277.0068 | learning rate 5.0000
| end of split  23 /113 | epoch   1 | time: 234.10s | valid loss 7.1516 | valid ppl 1276.1012 | learning rate 5.0000
| end of split  24 /113 | epoch   1 | time: 234.94s | valid loss 7.1347 | valid ppl 1254.7220 | learning rate 5.0000
| end of split  25 /113 | epoch   1 | time: 232.93s | valid loss 7.1199 | valid ppl 1236.2833 | learning rate 5.0000
| end of split  26 /113 | epoch   1 | time: 234.40s | valid loss 7.1184 | valid ppl 1234.5018 | learning rate 5.0000
| end of split  27 /113 | epoch   1 | time: 237.28s | valid loss 7.1083 | valid ppl 1222.0958 | learning rate 5.0000
| end of split  28 /113 | epoch   1 | time: 231.57s | valid loss 7.1589 | valid ppl 1285.4715 | learning rate 5.0000
| end of split  29 /113 | epoch   1 | time: 232.64s | valid loss 7.1232 | valid ppl 1240.4354 | learning rate 5.0000
| end of split  30 /113 | epoch   1 | time: 238.52s | valid loss 7.0960 | valid ppl 1207.1889 | learning rate 5.0000
| end of split  31 /113 | epoch   1 | time: 235.86s | valid loss 7.1294 | valid ppl 1248.0873 | learning rate 5.0000
| end of split  32 /113 | epoch   1 | time: 234.67s | valid loss 7.1366 | valid ppl 1257.1105 | learning rate 5.0000
| end of split  33 /113 | epoch   1 | time: 236.46s | valid loss 7.0806 | valid ppl 1188.6487 | learning rate 5.0000
| end of split  34 /113 | epoch   1 | time: 231.14s | valid loss 7.1160 | valid ppl 1231.4851 | learning rate 5.0000
| end of split  35 /113 | epoch   1 | time: 236.11s | valid loss 7.1426 | valid ppl 1264.6883 | learning rate 5.0000
| end of split  36 /113 | epoch   1 | time: 232.98s | valid loss 7.1442 | valid ppl 1266.7118 | learning rate 5.0000
| end of split  37 /113 | epoch   1 | time: 235.77s | valid loss 7.1382 | valid ppl 1259.1016 | learning rate 5.0000
| end of split  38 /113 | epoch   1 | time: 235.38s | valid loss 7.0742 | valid ppl 1181.0755 | learning rate 5.0000
| end of split  39 /113 | epoch   1 | time: 230.26s | valid loss 7.1081 | valid ppl 1221.7934 | learning rate 5.0000
| end of split  40 /113 | epoch   1 | time: 233.25s | valid loss 7.0893 | valid ppl 1199.0533 | learning rate 5.0000
| end of split  41 /113 | epoch   1 | time: 232.96s | valid loss 7.0886 | valid ppl 1198.2460 | learning rate 5.0000
| end of split  42 /113 | epoch   1 | time: 233.86s | valid loss 7.1457 | valid ppl 1268.6031 | learning rate 5.0000
| end of split  43 /113 | epoch   1 | time: 234.62s | valid loss 7.1386 | valid ppl 1259.6532 | learning rate 5.0000
| end of split  44 /113 | epoch   1 | time: 232.69s | valid loss 7.0900 | valid ppl 1199.9118 | learning rate 5.0000
| end of split  45 /113 | epoch   1 | time: 230.84s | valid loss 7.1523 | valid ppl 1276.9780 | learning rate 5.0000
| end of split  46 /113 | epoch   1 | time: 231.71s | valid loss 7.1219 | valid ppl 1238.7760 | learning rate 5.0000
| end of split  47 /113 | epoch   1 | time: 230.86s | valid loss 7.0811 | valid ppl 1189.2806 | learning rate 5.0000
| end of split  48 /113 | epoch   1 | time: 232.63s | valid loss 7.1543 | valid ppl 1279.6527 | learning rate 5.0000
| end of split  49 /113 | epoch   1 | time: 233.86s | valid loss 7.0683 | valid ppl 1174.0986 | learning rate 5.0000
| end of split  50 /113 | epoch   1 | time: 229.15s | valid loss 7.0550 | valid ppl 1158.6403 | learning rate 5.0000
| end of split  51 /113 | epoch   1 | time: 236.63s | valid loss 7.1117 | valid ppl 1226.2546 | learning rate 5.0000
| end of split  52 /113 | epoch   1 | time: 238.10s | valid loss 7.1026 | valid ppl 1215.1584 | learning rate 5.0000
| end of split  53 /113 | epoch   1 | time: 232.74s | valid loss 7.0969 | valid ppl 1208.2648 | learning rate 5.0000
| end of split  54 /113 | epoch   1 | time: 238.09s | valid loss 7.0846 | valid ppl 1193.4612 | learning rate 5.0000
| end of split  55 /113 | epoch   1 | time: 233.70s | valid loss 7.1157 | valid ppl 1231.1284 | learning rate 5.0000
| end of split  56 /113 | epoch   1 | time: 230.09s | valid loss 7.0540 | valid ppl 1157.4801 | learning rate 5.0000
| end of split  57 /113 | epoch   1 | time: 235.27s | valid loss 7.0783 | valid ppl 1185.9658 | learning rate 5.0000
| end of split  58 /113 | epoch   1 | time: 233.74s | valid loss 7.1189 | valid ppl 1235.0774 | learning rate 5.0000
| end of split  59 /113 | epoch   1 | time: 229.77s | valid loss 7.0364 | valid ppl 1137.2668 | learning rate 5.0000
| end of split  60 /113 | epoch   1 | time: 233.24s | valid loss 7.0514 | valid ppl 1154.5030 | learning rate 5.0000
| end of split  61 /113 | epoch   1 | time: 236.63s | valid loss 7.1055 | valid ppl 1218.6020 | learning rate 5.0000
| end of split  62 /113 | epoch   1 | time: 233.17s | valid loss 7.1210 | valid ppl 1237.6443 | learning rate 5.0000
| end of split  63 /113 | epoch   1 | time: 234.66s | valid loss 7.0762 | valid ppl 1183.4137 | learning rate 5.0000
| end of split  64 /113 | epoch   1 | time: 232.58s | valid loss 7.1240 | valid ppl 1241.4370 | learning rate 5.0000
| end of split  65 /113 | epoch   1 | time: 231.51s | valid loss 7.0930 | valid ppl 1203.5000 | learning rate 5.0000
| end of split  66 /113 | epoch   1 | time: 232.26s | valid loss 7.1001 | valid ppl 1212.0637 | learning rate 5.0000
| end of split  67 /113 | epoch   1 | time: 228.92s | valid loss 7.0738 | valid ppl 1180.6015 | learning rate 5.0000
| end of split  68 /113 | epoch   1 | time: 230.60s | valid loss 7.1206 | valid ppl 1237.2528 | learning rate 5.0000
| end of split  69 /113 | epoch   1 | time: 232.29s | valid loss 7.1268 | valid ppl 1244.8903 | learning rate 5.0000
| end of split  70 /113 | epoch   1 | time: 234.60s | valid loss 7.1138 | valid ppl 1228.8092 | learning rate 5.0000
| end of split  71 /113 | epoch   1 | time: 231.33s | valid loss 7.0736 | valid ppl 1180.4231 | learning rate 5.0000
| end of split  72 /113 | epoch   1 | time: 235.50s | valid loss 7.0407 | valid ppl 1142.1916 | learning rate 5.0000
| end of split  73 /113 | epoch   1 | time: 230.23s | valid loss 7.0512 | valid ppl 1154.2604 | learning rate 5.0000
| end of split  74 /113 | epoch   1 | time: 239.00s | valid loss 7.1215 | valid ppl 1238.2501 | learning rate 5.0000
| end of split  75 /113 | epoch   1 | time: 234.03s | valid loss 7.1852 | valid ppl 1319.7906 | learning rate 5.0000
| end of split  76 /113 | epoch   1 | time: 234.28s | valid loss 7.0916 | valid ppl 1201.8453 | learning rate 5.0000
| end of split  77 /113 | epoch   1 | time: 235.71s | valid loss 7.0874 | valid ppl 1196.7356 | learning rate 5.0000
| end of split  78 /113 | epoch   1 | time: 237.06s | valid loss 7.1335 | valid ppl 1253.2911 | learning rate 5.0000
| end of split  79 /113 | epoch   1 | time: 233.74s | valid loss 7.1122 | valid ppl 1226.8927 | learning rate 5.0000
| end of split  80 /113 | epoch   1 | time: 233.17s | valid loss 7.1309 | valid ppl 1250.0614 | learning rate 5.0000
| end of split  81 /113 | epoch   1 | time: 232.30s | valid loss 7.0873 | valid ppl 1196.7297 | learning rate 5.0000
| end of split  82 /113 | epoch   1 | time: 231.22s | valid loss 7.1370 | valid ppl 1257.6055 | learning rate 5.0000
| end of split  83 /113 | epoch   1 | time: 231.43s | valid loss 7.0576 | valid ppl 1161.6918 | learning rate 5.0000
| end of split  84 /113 | epoch   1 | time: 235.02s | valid loss 7.0657 | valid ppl 1171.0550 | learning rate 5.0000
| end of split  85 /113 | epoch   1 | time: 234.79s | valid loss 7.1117 | valid ppl 1226.2184 | learning rate 5.0000
| end of split  86 /113 | epoch   1 | time: 239.30s | valid loss 7.0911 | valid ppl 1201.2320 | learning rate 5.0000
| end of split  87 /113 | epoch   1 | time: 230.62s | valid loss 7.0994 | valid ppl 1211.2212 | learning rate 5.0000
| end of split  88 /113 | epoch   1 | time: 231.93s | valid loss 7.1275 | valid ppl 1245.7974 | learning rate 5.0000
| end of split  89 /113 | epoch   1 | time: 231.13s | valid loss 7.0923 | valid ppl 1202.6127 | learning rate 5.0000
| end of split  90 /113 | epoch   1 | time: 236.74s | valid loss 7.1520 | valid ppl 1276.6935 | learning rate 5.0000
| end of split  91 /113 | epoch   1 | time: 232.98s | valid loss 7.1159 | valid ppl 1231.3526 | learning rate 5.0000
| end of split  92 /113 | epoch   1 | time: 236.25s | valid loss 7.1405 | valid ppl 1262.0972 | learning rate 5.0000
| end of split  93 /113 | epoch   1 | time: 234.62s | valid loss 7.0885 | valid ppl 1198.1424 | learning rate 5.0000
| end of split  94 /113 | epoch   1 | time: 233.59s | valid loss 7.1003 | valid ppl 1212.3560 | learning rate 5.0000
| end of split  95 /113 | epoch   1 | time: 233.27s | valid loss 7.1059 | valid ppl 1219.0888 | learning rate 5.0000
| end of split  96 /113 | epoch   1 | time: 231.78s | valid loss 7.1232 | valid ppl 1240.4668 | learning rate 5.0000
| end of split  97 /113 | epoch   1 | time: 235.60s | valid loss 7.1186 | valid ppl 1234.7345 | learning rate 5.0000
| end of split  98 /113 | epoch   1 | time: 233.88s | valid loss 7.1161 | valid ppl 1231.6487 | learning rate 5.0000
| end of split  99 /113 | epoch   1 | time: 236.68s | valid loss 7.1076 | valid ppl 1221.1639 | learning rate 5.0000
| end of split 100 /113 | epoch   1 | time: 232.62s | valid loss 7.0984 | valid ppl 1210.0832 | learning rate 5.0000
| end of split 101 /113 | epoch   1 | time: 233.49s | valid loss 7.1288 | valid ppl 1247.4030 | learning rate 5.0000
| end of split 102 /113 | epoch   1 | time: 232.34s | valid loss 7.0934 | valid ppl 1204.0527 | learning rate 5.0000
| end of split 103 /113 | epoch   1 | time: 230.64s | valid loss 7.1062 | valid ppl 1219.4642 | learning rate 5.0000
| end of split 104 /113 | epoch   1 | time: 235.83s | valid loss 7.1531 | valid ppl 1278.0091 | learning rate 5.0000
| end of split 105 /113 | epoch   1 | time: 230.35s | valid loss 7.1200 | valid ppl 1236.4884 | learning rate 5.0000
| end of split 106 /113 | epoch   1 | time: 231.68s | valid loss 7.1236 | valid ppl 1240.9623 | learning rate 5.0000
| end of split 107 /113 | epoch   1 | time: 236.04s | valid loss 7.0998 | valid ppl 1211.7024 | learning rate 5.0000
| end of split 108 /113 | epoch   1 | time: 231.16s | valid loss 7.1267 | valid ppl 1244.7170 | learning rate 5.0000
| end of split 109 /113 | epoch   1 | time: 235.80s | valid loss 7.1114 | valid ppl 1225.8615 | learning rate 5.0000
| end of split 110 /113 | epoch   1 | time: 229.11s | valid loss 7.0848 | valid ppl 1193.6844 | learning rate 5.0000
| end of split 111 /113 | epoch   1 | time: 232.32s | valid loss 7.0782 | valid ppl 1185.7957 | learning rate 1.2500
| end of split 112 /113 | epoch   1 | time: 232.60s | valid loss 7.0965 | valid ppl 1207.7586 | learning rate 1.2500
| end of split 113 /113 | epoch   1 | time: 237.25s | valid loss 7.1007 | valid ppl 1212.7755 | learning rate 1.2500
| end of split   1 /113 | epoch   2 | time: 229.76s | valid loss 7.0779 | valid ppl 1185.4298 | learning rate 1.2500
| end of split   2 /113 | epoch   2 | time: 232.20s | valid loss 7.0994 | valid ppl 1211.1846 | learning rate 1.2500
| end of split   3 /113 | epoch   2 | time: 230.39s | valid loss 7.0802 | valid ppl 1188.2092 | learning rate 1.2500
| end of split   4 /113 | epoch   2 | time: 232.46s | valid loss 7.0951 | valid ppl 1205.9962 | learning rate 1.2500
| end of split   5 /113 | epoch   2 | time: 232.66s | valid loss 7.1047 | valid ppl 1217.6557 | learning rate 1.2500
| end of split   6 /113 | epoch   2 | time: 231.54s | valid loss 7.0950 | valid ppl 1205.9267 | learning rate 1.2500
| end of split   7 /113 | epoch   2 | time: 234.75s | valid loss 7.1142 | valid ppl 1229.3492 | learning rate 1.2500
| end of split   8 /113 | epoch   2 | time: 235.30s | valid loss 7.0901 | valid ppl 1200.0375 | learning rate 1.2500
| end of split   9 /113 | epoch   2 | time: 235.81s | valid loss 7.0971 | valid ppl 1208.4907 | learning rate 1.2500
| end of split  10 /113 | epoch   2 | time: 230.40s | valid loss 7.0927 | valid ppl 1203.1642 | learning rate 1.2500
| end of split  11 /113 | epoch   2 | time: 235.86s | valid loss 7.1028 | valid ppl 1215.3789 | learning rate 1.2500
| end of split  12 /113 | epoch   2 | time: 230.91s | valid loss 7.0949 | valid ppl 1205.7953 | learning rate 1.2500
| end of split  13 /113 | epoch   2 | time: 233.88s | valid loss 7.0789 | valid ppl 1186.6439 | learning rate 1.2500
| end of split  14 /113 | epoch   2 | time: 232.71s | valid loss 7.0946 | valid ppl 1205.4994 | learning rate 1.2500
| end of split  15 /113 | epoch   2 | time: 230.99s | valid loss 7.0850 | valid ppl 1193.9639 | learning rate 1.2500
| end of split  16 /113 | epoch   2 | time: 227.77s | valid loss 7.1121 | valid ppl 1226.6969 | learning rate 1.2500
| end of split  17 /113 | epoch   2 | time: 235.85s | valid loss 7.0980 | valid ppl 1209.5941 | learning rate 1.2500
| end of split  18 /113 | epoch   2 | time: 235.06s | valid loss 7.0815 | valid ppl 1189.7783 | learning rate 1.2500
| end of split  19 /113 | epoch   2 | time: 237.29s | valid loss 7.1028 | valid ppl 1215.3490 | learning rate 1.2500
| end of split  20 /113 | epoch   2 | time: 235.29s | valid loss 7.0942 | valid ppl 1204.9817 | learning rate 1.2500
| end of split  21 /113 | epoch   2 | time: 231.22s | valid loss 7.0837 | valid ppl 1192.3273 | learning rate 1.2500
| end of split  22 /113 | epoch   2 | time: 235.58s | valid loss 7.0989 | valid ppl 1210.6321 | learning rate 1.2500
| end of split  23 /113 | epoch   2 | time: 232.62s | valid loss 7.0947 | valid ppl 1205.5749 | learning rate 1.2500
| end of split  24 /113 | epoch   2 | time: 238.49s | valid loss 7.1007 | valid ppl 1212.8266 | learning rate 1.2500
| end of split  25 /113 | epoch   2 | time: 228.89s | valid loss 7.0794 | valid ppl 1187.2814 | learning rate 1.2500
| end of split  26 /113 | epoch   2 | time: 231.21s | valid loss 7.0910 | valid ppl 1201.0850 | learning rate 1.2500
| end of split  27 /113 | epoch   2 | time: 236.23s | valid loss 7.0950 | valid ppl 1205.9267 | learning rate 1.2500
| end of split  28 /113 | epoch   2 | time: 234.70s | valid loss 7.0858 | valid ppl 1194.8918 | learning rate 1.2500
| end of split  29 /113 | epoch   2 | time: 229.67s | valid loss 7.0637 | valid ppl 1168.7198 | learning rate 1.2500
| end of split  30 /113 | epoch   2 | time: 230.59s | valid loss 7.1101 | valid ppl 1224.2250 | learning rate 1.2500
| end of split  31 /113 | epoch   2 | time: 232.68s | valid loss 7.0836 | valid ppl 1192.2460 | learning rate 1.2500
| end of split  32 /113 | epoch   2 | time: 231.80s | valid loss 7.1094 | valid ppl 1223.3879 | learning rate 1.2500
| end of split  33 /113 | epoch   2 | time: 234.73s | valid loss 7.1026 | valid ppl 1215.0679 | learning rate 1.2500
| end of split  34 /113 | epoch   2 | time: 232.94s | valid loss 7.0845 | valid ppl 1193.3580 | learning rate 1.2500
| end of split  35 /113 | epoch   2 | time: 232.85s | valid loss 7.1046 | valid ppl 1217.5067 | learning rate 1.2500
| end of split  36 /113 | epoch   2 | time: 236.10s | valid loss 7.1064 | valid ppl 1219.7146 | learning rate 1.2500
| end of split  37 /113 | epoch   2 | time: 234.89s | valid loss 7.0999 | valid ppl 1211.8541 | learning rate 1.2500
| end of split  38 /113 | epoch   2 | time: 239.33s | valid loss 7.0895 | valid ppl 1199.2961 | learning rate 1.2500
| end of split  39 /113 | epoch   2 | time: 239.01s | valid loss 7.1112 | valid ppl 1225.6211 | learning rate 1.2500
| end of split  40 /113 | epoch   2 | time: 233.50s | valid loss 7.0895 | valid ppl 1199.3484 | learning rate 1.2500
| end of split  41 /113 | epoch   2 | time: 237.27s | valid loss 7.0723 | valid ppl 1178.8008 | learning rate 1.2500
| end of split  42 /113 | epoch   2 | time: 231.15s | valid loss 7.0958 | valid ppl 1206.8495 | learning rate 1.2500
| end of split  43 /113 | epoch   2 | time: 231.39s | valid loss 7.0922 | valid ppl 1202.5908 | learning rate 1.2500
| end of split  44 /113 | epoch   2 | time: 229.96s | valid loss 7.1024 | valid ppl 1214.8449 | learning rate 1.2500
| end of split  45 /113 | epoch   2 | time: 237.25s | valid loss 7.1115 | valid ppl 1226.0123 | learning rate 1.2500
| end of split  46 /113 | epoch   2 | time: 233.19s | valid loss 7.0828 | valid ppl 1191.2430 | learning rate 1.2500
| end of split  47 /113 | epoch   2 | time: 232.26s | valid loss 7.0917 | valid ppl 1201.9762 | learning rate 1.2500
| end of split  48 /113 | epoch   2 | time: 227.95s | valid loss 7.0983 | valid ppl 1209.8765 | learning rate 1.2500
| end of split  49 /113 | epoch   2 | time: 232.30s | valid loss 7.0888 | valid ppl 1198.4128 | learning rate 0.3125
| end of split  50 /113 | epoch   2 | time: 238.16s | valid loss 7.0910 | valid ppl 1201.0504 | learning rate 0.3125
| end of split  51 /113 | epoch   2 | time: 233.23s | valid loss 7.0949 | valid ppl 1205.7495 | learning rate 0.3125
| end of split  52 /113 | epoch   2 | time: 232.61s | valid loss 7.0807 | valid ppl 1188.8117 | learning rate 0.3125
| end of split  53 /113 | epoch   2 | time: 233.73s | valid loss 7.0902 | valid ppl 1200.1734 | learning rate 0.3125
| end of split  54 /113 | epoch   2 | time: 230.67s | valid loss 7.0855 | valid ppl 1194.5399 | learning rate 0.3125
| end of split  55 /113 | epoch   2 | time: 235.17s | valid loss 7.0903 | valid ppl 1200.2645 | learning rate 0.3125
| end of split  56 /113 | epoch   2 | time: 230.04s | valid loss 7.0905 | valid ppl 1200.5506 | learning rate 0.3125
| end of split  57 /113 | epoch   2 | time: 235.80s | valid loss 7.0972 | valid ppl 1208.5664 | learning rate 0.3125
| end of split  58 /113 | epoch   2 | time: 233.83s | valid loss 7.0926 | valid ppl 1203.0872 | learning rate 0.3125
| end of split  59 /113 | epoch   2 | time: 234.66s | valid loss 7.0922 | valid ppl 1202.5223 | learning rate 0.3125
| end of split  60 /113 | epoch   2 | time: 231.74s | valid loss 7.0899 | valid ppl 1199.8190 | learning rate 0.3125
| end of split  61 /113 | epoch   2 | time: 228.91s | valid loss 7.0938 | valid ppl 1204.4743 | learning rate 0.3125
| end of split  62 /113 | epoch   2 | time: 235.87s | valid loss 7.0887 | valid ppl 1198.3909 | learning rate 0.3125
| end of split  63 /113 | epoch   2 | time: 234.42s | valid loss 7.0820 | valid ppl 1190.2886 | learning rate 0.3125
| end of split  64 /113 | epoch   2 | time: 233.77s | valid loss 7.0910 | valid ppl 1201.1087 | learning rate 0.3125
| end of split  65 /113 | epoch   2 | time: 235.55s | valid loss 7.0922 | valid ppl 1202.4961 | learning rate 0.3125
| end of split  66 /113 | epoch   2 | time: 231.77s | valid loss 7.0890 | valid ppl 1198.6597 | learning rate 0.3125
| end of split  67 /113 | epoch   2 | time: 239.03s | valid loss 7.0907 | valid ppl 1200.6899 | learning rate 0.3125
| end of split  68 /113 | epoch   2 | time: 233.79s | valid loss 7.0929 | valid ppl 1203.3503 | learning rate 0.3125
| end of split  69 /113 | epoch   2 | time: 230.34s | valid loss 7.0980 | valid ppl 1209.6052 | learning rate 0.3125
| end of split  70 /113 | epoch   2 | time: 236.49s | valid loss 7.0882 | valid ppl 1197.7819 | learning rate 0.3125
| end of split  71 /113 | epoch   2 | time: 234.44s | valid loss 7.1003 | valid ppl 1212.3714 | learning rate 0.3125
| end of split  72 /113 | epoch   2 | time: 233.01s | valid loss 7.0828 | valid ppl 1191.3159 | learning rate 0.3125
| end of split  73 /113 | epoch   2 | time: 238.78s | valid loss 7.0959 | valid ppl 1207.0328 | learning rate 0.3125
| end of split  74 /113 | epoch   2 | time: 239.67s | valid loss 7.0914 | valid ppl 1201.5850 | learning rate 0.3125
| end of split  75 /113 | epoch   2 | time: 230.83s | valid loss 7.1005 | valid ppl 1212.5495 | learning rate 0.3125
| end of split  76 /113 | epoch   2 | time: 235.05s | valid loss 7.0889 | valid ppl 1198.6319 | learning rate 0.3125
| end of split  77 /113 | epoch   2 | time: 230.27s | valid loss 7.0923 | valid ppl 1202.6914 | learning rate 0.3125
| end of split  78 /113 | epoch   2 | time: 231.51s | valid loss 7.0787 | valid ppl 1186.4144 | learning rate 0.3125
| end of split  79 /113 | epoch   2 | time: 232.70s | valid loss 7.0995 | valid ppl 1211.3830 | learning rate 0.3125
| end of split  80 /113 | epoch   2 | time: 233.21s | valid loss 7.0929 | valid ppl 1203.3740 | learning rate 0.3125
| end of split  81 /113 | epoch   2 | time: 230.05s | valid loss 7.0802 | valid ppl 1188.1591 | learning rate 0.3125
| end of split  82 /113 | epoch   2 | time: 235.62s | valid loss 7.0860 | valid ppl 1195.0842 | learning rate 0.3125
| end of split  83 /113 | epoch   2 | time: 236.11s | valid loss 7.0906 | valid ppl 1200.6764 | learning rate 0.3125
| end of split  84 /113 | epoch   2 | time: 230.87s | valid loss 7.0850 | valid ppl 1193.9009 | learning rate 0.3125
| end of split  85 /113 | epoch   2 | time: 232.62s | valid loss 7.0939 | valid ppl 1204.6437 | learning rate 0.3125
| end of split  86 /113 | epoch   2 | time: 238.23s | valid loss 7.0856 | valid ppl 1194.6482 | learning rate 0.3125
| end of split  87 /113 | epoch   2 | time: 233.77s | valid loss 7.0942 | valid ppl 1205.0113 | learning rate 0.3125
| end of split  88 /113 | epoch   2 | time: 230.52s | valid loss 7.0954 | valid ppl 1206.3736 | learning rate 0.3125
| end of split  89 /113 | epoch   2 | time: 235.21s | valid loss 7.0953 | valid ppl 1206.2616 | learning rate 0.3125
| end of split  90 /113 | epoch   2 | time: 236.74s | valid loss 7.0902 | valid ppl 1200.1371 | learning rate 0.3125
| end of split  91 /113 | epoch   2 | time: 234.19s | valid loss 7.0940 | valid ppl 1204.7284 | learning rate 0.3125
| end of split  92 /113 | epoch   2 | time: 229.17s | valid loss 7.0667 | valid ppl 1172.2181 | learning rate 0.3125
| end of split  93 /113 | epoch   2 | time: 233.18s | valid loss 7.0851 | valid ppl 1193.9966 | learning rate 0.3125
| end of split  94 /113 | epoch   2 | time: 233.54s | valid loss 7.0983 | valid ppl 1209.8629 | learning rate 0.3125
| end of split  95 /113 | epoch   2 | time: 240.46s | valid loss 7.0915 | valid ppl 1201.7565 | learning rate 0.3125
| end of split  96 /113 | epoch   2 | time: 232.63s | valid loss 7.0925 | valid ppl 1202.8766 | learning rate 0.3125
| end of split  97 /113 | epoch   2 | time: 236.79s | valid loss 7.0868 | valid ppl 1196.0248 | learning rate 0.3125
| end of split  98 /113 | epoch   2 | time: 234.71s | valid loss 7.0826 | valid ppl 1191.0655 | learning rate 0.3125
| end of split  99 /113 | epoch   2 | time: 233.29s | valid loss 7.0957 | valid ppl 1206.8113 | learning rate 0.3125
| end of split 100 /113 | epoch   2 | time: 236.83s | valid loss 7.0924 | valid ppl 1202.8005 | learning rate 0.0781
| end of split 101 /113 | epoch   2 | time: 48.85s | valid loss 7.0897 | valid ppl 1199.5980 | learning rate 0.0781
| end of split 102 /113 | epoch   2 | time: 236.70s | valid loss 7.0890 | valid ppl 1198.7280 | learning rate 0.0781
| end of split 103 /113 | epoch   2 | time: 238.79s | valid loss 7.0864 | valid ppl 1195.5683 | learning rate 0.0781
| end of split 104 /113 | epoch   2 | time: 232.38s | valid loss 7.0929 | valid ppl 1203.4357 | learning rate 0.0781
| end of split 105 /113 | epoch   2 | time: 229.19s | valid loss 7.0942 | valid ppl 1204.8987 | learning rate 0.0781
| end of split 106 /113 | epoch   2 | time: 231.16s | valid loss 7.0949 | valid ppl 1205.8207 | learning rate 0.0781
| end of split 107 /113 | epoch   2 | time: 232.93s | valid loss 7.0896 | valid ppl 1199.3762 | learning rate 0.0781
| end of split 108 /113 | epoch   2 | time: 234.06s | valid loss 7.0961 | valid ppl 1207.2101 | learning rate 0.0781
| end of split 109 /113 | epoch   2 | time: 233.27s | valid loss 7.0883 | valid ppl 1197.8653 | learning rate 0.0781
| end of split 110 /113 | epoch   2 | time: 234.69s | valid loss 7.0930 | valid ppl 1203.4772 | learning rate 0.0781
| end of split 111 /113 | epoch   2 | time: 231.50s | valid loss 7.0946 | valid ppl 1205.4435 | learning rate 0.0781
| end of split 112 /113 | epoch   2 | time: 233.79s | valid loss 7.0864 | valid ppl 1195.5549 | learning rate 0.0781
| end of split 113 /113 | epoch   2 | time: 232.14s | valid loss 7.0906 | valid ppl 1200.6055 | learning rate 0.0781
TEST: valid loss 7.0908 | valid ppl 1200.8965