readme: update
Browse files- README.md +69 -103
- best-lm.pt +0 -3
- loss.txt +0 -227
- training.log +0 -0
README.md
CHANGED
@@ -2,118 +2,84 @@
|
|
2 |
license: cc-by-sa-3.0
|
3 |
language:
|
4 |
- de
|
5 |
-
library_name: flair
|
6 |
---
|
7 |
|
8 |
-
#
|
9 |
|
10 |
-
Research & development of
|
11 |
|
12 |
The Flair team is currently working on the integration of xLSTM (both LM training and fine-tuning models for downstream tasks).
|
13 |
-
Check out the `xlstm` [branch in the Flair repository](https://github.com/flairNLP/flair/tree/xlstm) - many thanks to [Patrick Haller](https://huggingface.co/PatrickHaller) for the work on it.
|
14 |
|
15 |
-
|
16 |
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
* `train` -> Folder with text files as training corpus
|
24 |
-
|
25 |
-
The model was trained with the following parameters for 2 epochs:
|
26 |
-
|
27 |
-
```python3
|
28 |
-
import flair
|
29 |
-
import torch
|
30 |
-
|
31 |
-
from flair.data import SubTokenDictionary
|
32 |
-
from flair.models import xLSTMLanguageModel
|
33 |
-
from flair.trainers.language_model_trainer import LanguageModelTrainer, TextCorpus
|
34 |
-
|
35 |
-
from transformers import AutoTokenizer
|
36 |
-
|
37 |
-
flair.device = torch.device('cuda:0')
|
38 |
-
|
39 |
-
is_forward_lm = True
|
40 |
-
|
41 |
-
dictionary = SubTokenDictionary.load("gwlms/bert-base-dewiki-v1")
|
42 |
-
|
43 |
-
corpus = TextCorpus("/home/ubuntu/splitted_corpus",
|
44 |
-
dictionary,
|
45 |
-
is_forward_lm,
|
46 |
-
character_level=False,
|
47 |
-
random_case_flip=True,
|
48 |
-
)
|
49 |
-
|
50 |
-
xlstm_ablation_1 = """
|
51 |
-
mlstm_block:
|
52 |
-
mlstm:
|
53 |
-
conv1d_kernel_size: 2
|
54 |
-
qkv_proj_blocksize: 2
|
55 |
-
num_heads: 2
|
56 |
-
slstm_block:
|
57 |
-
slstm:
|
58 |
-
backend: cuda
|
59 |
-
num_heads: 2
|
60 |
-
conv1d_kernel_size: 2
|
61 |
-
bias_init: powerlaw_blockdependent
|
62 |
-
feedforward:
|
63 |
-
proj_factor: 1.3
|
64 |
-
act_fn: gelu
|
65 |
-
context_length: 256
|
66 |
-
num_blocks: 7
|
67 |
-
embedding_dim: 128
|
68 |
-
slstm_at: [1]
|
69 |
-
"""
|
70 |
-
|
71 |
-
language_model = xLSTMLanguageModel(dictionary, xlstm_cfg=xlstm_ablation_1,
|
72 |
-
is_forward_lm=True)
|
73 |
-
print(language_model)
|
74 |
-
|
75 |
-
trainer = LanguageModelTrainer(language_model, corpus)
|
76 |
-
|
77 |
-
trainer.train("xflair-german-wikipedia-xlstm_ablation_1-bs64-lr5-e2",
|
78 |
-
sequence_length=256,
|
79 |
-
mini_batch_size=64,
|
80 |
-
learning_rate=5,
|
81 |
-
patience=50,
|
82 |
-
max_epochs=2,
|
83 |
-
checkpoint=False,
|
84 |
-
num_workers=4,
|
85 |
-
)
|
86 |
-
```
|
87 |
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
|
106 |
-
|
107 |
-
|
108 |
-
|
109 |
-
|
110 |
-
|
111 |
-
|
112 |
-
|
113 |
-
|
114 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
115 |
```
|
116 |
|
117 |
# Caveats
|
118 |
|
119 |
-
Notice: this model integration is heavily under development. And in the process of finding good hyper-parameters.
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: cc-by-sa-3.0
|
3 |
language:
|
4 |
- de
|
|
|
5 |
---
|
6 |
|
7 |
+
# xLSTM Model trained on German Wikipedia
|
8 |
|
9 |
+
Research & development of an xLSTM model trained on German Wikipedia.
|
10 |
|
11 |
The Flair team is currently working on the integration of xLSTM (both LM training and fine-tuning models for downstream tasks).
|
|
|
12 |
|
13 |
+
For pretraining this xLSTM model, we this [fork](https://github.com/HallerPatrick/helibrunna) (from [Patrick Haller](https://huggingface.co/PatrickHaller)) of the awesome [Helibrunna](https://github.com/AI-Guru/helibrunna) library.
|
14 |
|
15 |
+
Initially, we integrated xLSTM model training into Flair - for more information about this, please refer to the archived [flair-old](https://huggingface.co/stefan-it/xlstm-german-wikipedia/blob/flair-old/README.md) branch of this repository.
|
16 |
+
|
17 |
+
# Changelog
|
18 |
+
|
19 |
+
- 28.08.2024: Model training is now done with [Helibrunna](https://github.com/AI-Guru/helibrunna) fork - find it [here](https://github.com/HallerPatrick/helibrunna).
|
20 |
+
- 10.06.2024: Initial version. xLSTM was trained with Flair library, see this [old](https://huggingface.co/stefan-it/xlstm-german-wikipedia/blob/flair-old/README.md) branch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
+
# Training
|
23 |
+
|
24 |
+
The current model was trained with commit `f66cc55` from the [`main` branch](https://github.com/HallerPatrick/helibrunna) of the forked Helibrunna repo.
|
25 |
+
|
26 |
+
The `xlstm` [library](https://github.com/NX-AI/xlstm) needs to be installed manually - also check that `pip3 install Ninja` is installed.
|
27 |
+
|
28 |
+
The German Wikipedia dump from [this repository](https://huggingface.co/datasets/gwlms/dewiki-20230701-flair-corpus) is used.
|
29 |
+
|
30 |
+
The following training configuration is used:
|
31 |
+
|
32 |
+
```yaml
|
33 |
+
description: "Train a wikipedia xLSTM"
|
34 |
+
|
35 |
+
training:
|
36 |
+
model_name: "german_wikipedia"
|
37 |
+
batch_size: 10
|
38 |
+
lr: 6e-4
|
39 |
+
lr_warmup_steps: 4584
|
40 |
+
lr_decay_until_steps: "auto"
|
41 |
+
lr_decay_factor: 0.001
|
42 |
+
weight_decay: 0.1
|
43 |
+
amp_precision: bfloat16
|
44 |
+
weight_precision: float32
|
45 |
+
enable_mixed_precision: true
|
46 |
+
num_epochs: 1
|
47 |
+
output_dir: "./output"
|
48 |
+
save_every_step: 2000
|
49 |
+
log_every_step: 10
|
50 |
+
generate_every_step: 5000
|
51 |
+
wandb_project: "xlstm"
|
52 |
+
gradient_clipping: "auto"
|
53 |
+
# wandb_project: "lovecraftxlstm"
|
54 |
+
|
55 |
+
model:
|
56 |
+
num_blocks: 24
|
57 |
+
embedding_dim: 768
|
58 |
+
mlstm_block:
|
59 |
+
mlstm:
|
60 |
+
num_heads: 4
|
61 |
+
slstm_block: {}
|
62 |
+
slstm_at: []
|
63 |
+
context_length: 512
|
64 |
+
|
65 |
+
dataset:
|
66 |
+
output_path: "./output/german-wikipedia-dataset"
|
67 |
+
hugging_face_id: ["stefan-it/dewiki-20230701"]
|
68 |
+
split: "train" # Also subsetting is possible: "train[:100000]"
|
69 |
+
shuffle: False
|
70 |
+
seed: 42
|
71 |
+
|
72 |
+
tokenizer:
|
73 |
+
type: "pretrained"
|
74 |
+
pretrained_class: "LlamaTokenizer"
|
75 |
+
pretrained_id: "meta-llama/Llama-2-7b-hf"
|
76 |
```
|
77 |
|
78 |
# Caveats
|
79 |
|
80 |
+
Notice: this model integration is heavily under development. And in the process of finding good hyper-parameters.
|
81 |
+
Also downstream experiments are coming very soon.
|
82 |
+
|
83 |
+
Unfortunately, there are nan's occuring in the training:
|
84 |
+
|
85 |
+
![Training Loss](training-loss.png)
|
best-lm.pt
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:b5e754202c41fb92228df2651c4e24d497f8446493802f45f43f0ea8d47a7ec8
|
3 |
-
size 36371434
|
|
|
|
|
|
|
|
loss.txt
DELETED
@@ -1,227 +0,0 @@
|
|
1 |
-
| end of split 1 /113 | epoch 1 | time: 224.45s | valid loss 7.6183 | valid ppl 2035.0861 | learning rate 5.0000
|
2 |
-
| end of split 2 /113 | epoch 1 | time: 229.45s | valid loss 7.3864 | valid ppl 1613.9065 | learning rate 5.0000
|
3 |
-
| end of split 3 /113 | epoch 1 | time: 239.40s | valid loss 7.3424 | valid ppl 1544.3504 | learning rate 5.0000
|
4 |
-
| end of split 4 /113 | epoch 1 | time: 233.67s | valid loss 7.2568 | valid ppl 1417.6838 | learning rate 5.0000
|
5 |
-
| end of split 5 /113 | epoch 1 | time: 227.57s | valid loss 7.2848 | valid ppl 1458.0133 | learning rate 5.0000
|
6 |
-
| end of split 6 /113 | epoch 1 | time: 235.49s | valid loss 7.2458 | valid ppl 1402.2080 | learning rate 5.0000
|
7 |
-
| end of split 7 /113 | epoch 1 | time: 235.14s | valid loss 7.2137 | valid ppl 1357.8841 | learning rate 5.0000
|
8 |
-
| end of split 8 /113 | epoch 1 | time: 238.90s | valid loss 7.1989 | valid ppl 1337.9002 | learning rate 5.0000
|
9 |
-
| end of split 9 /113 | epoch 1 | time: 228.81s | valid loss 7.1782 | valid ppl 1310.5202 | learning rate 5.0000
|
10 |
-
| end of split 10 /113 | epoch 1 | time: 230.95s | valid loss 7.1692 | valid ppl 1298.8697 | learning rate 5.0000
|
11 |
-
| end of split 11 /113 | epoch 1 | time: 231.70s | valid loss 7.1442 | valid ppl 1266.7305 | learning rate 5.0000
|
12 |
-
| end of split 12 /113 | epoch 1 | time: 240.42s | valid loss 7.1839 | valid ppl 1317.9954 | learning rate 5.0000
|
13 |
-
| end of split 13 /113 | epoch 1 | time: 235.25s | valid loss 7.2127 | valid ppl 1356.5282 | learning rate 5.0000
|
14 |
-
| end of split 14 /113 | epoch 1 | time: 232.67s | valid loss 7.2704 | valid ppl 1437.1488 | learning rate 5.0000
|
15 |
-
| end of split 15 /113 | epoch 1 | time: 229.99s | valid loss 7.1410 | valid ppl 1262.7434 | learning rate 5.0000
|
16 |
-
| end of split 16 /113 | epoch 1 | time: 230.24s | valid loss 7.2028 | valid ppl 1343.1933 | learning rate 5.0000
|
17 |
-
| end of split 17 /113 | epoch 1 | time: 48.80s | valid loss 7.1864 | valid ppl 1321.2975 | learning rate 5.0000
|
18 |
-
| end of split 18 /113 | epoch 1 | time: 238.71s | valid loss 7.1344 | valid ppl 1254.4124 | learning rate 5.0000
|
19 |
-
| end of split 19 /113 | epoch 1 | time: 238.74s | valid loss 7.1402 | valid ppl 1261.6803 | learning rate 5.0000
|
20 |
-
| end of split 20 /113 | epoch 1 | time: 230.88s | valid loss 7.2222 | valid ppl 1369.5573 | learning rate 5.0000
|
21 |
-
| end of split 21 /113 | epoch 1 | time: 235.01s | valid loss 7.1024 | valid ppl 1214.8458 | learning rate 5.0000
|
22 |
-
| end of split 22 /113 | epoch 1 | time: 233.22s | valid loss 7.1523 | valid ppl 1277.0068 | learning rate 5.0000
|
23 |
-
| end of split 23 /113 | epoch 1 | time: 234.10s | valid loss 7.1516 | valid ppl 1276.1012 | learning rate 5.0000
|
24 |
-
| end of split 24 /113 | epoch 1 | time: 234.94s | valid loss 7.1347 | valid ppl 1254.7220 | learning rate 5.0000
|
25 |
-
| end of split 25 /113 | epoch 1 | time: 232.93s | valid loss 7.1199 | valid ppl 1236.2833 | learning rate 5.0000
|
26 |
-
| end of split 26 /113 | epoch 1 | time: 234.40s | valid loss 7.1184 | valid ppl 1234.5018 | learning rate 5.0000
|
27 |
-
| end of split 27 /113 | epoch 1 | time: 237.28s | valid loss 7.1083 | valid ppl 1222.0958 | learning rate 5.0000
|
28 |
-
| end of split 28 /113 | epoch 1 | time: 231.57s | valid loss 7.1589 | valid ppl 1285.4715 | learning rate 5.0000
|
29 |
-
| end of split 29 /113 | epoch 1 | time: 232.64s | valid loss 7.1232 | valid ppl 1240.4354 | learning rate 5.0000
|
30 |
-
| end of split 30 /113 | epoch 1 | time: 238.52s | valid loss 7.0960 | valid ppl 1207.1889 | learning rate 5.0000
|
31 |
-
| end of split 31 /113 | epoch 1 | time: 235.86s | valid loss 7.1294 | valid ppl 1248.0873 | learning rate 5.0000
|
32 |
-
| end of split 32 /113 | epoch 1 | time: 234.67s | valid loss 7.1366 | valid ppl 1257.1105 | learning rate 5.0000
|
33 |
-
| end of split 33 /113 | epoch 1 | time: 236.46s | valid loss 7.0806 | valid ppl 1188.6487 | learning rate 5.0000
|
34 |
-
| end of split 34 /113 | epoch 1 | time: 231.14s | valid loss 7.1160 | valid ppl 1231.4851 | learning rate 5.0000
|
35 |
-
| end of split 35 /113 | epoch 1 | time: 236.11s | valid loss 7.1426 | valid ppl 1264.6883 | learning rate 5.0000
|
36 |
-
| end of split 36 /113 | epoch 1 | time: 232.98s | valid loss 7.1442 | valid ppl 1266.7118 | learning rate 5.0000
|
37 |
-
| end of split 37 /113 | epoch 1 | time: 235.77s | valid loss 7.1382 | valid ppl 1259.1016 | learning rate 5.0000
|
38 |
-
| end of split 38 /113 | epoch 1 | time: 235.38s | valid loss 7.0742 | valid ppl 1181.0755 | learning rate 5.0000
|
39 |
-
| end of split 39 /113 | epoch 1 | time: 230.26s | valid loss 7.1081 | valid ppl 1221.7934 | learning rate 5.0000
|
40 |
-
| end of split 40 /113 | epoch 1 | time: 233.25s | valid loss 7.0893 | valid ppl 1199.0533 | learning rate 5.0000
|
41 |
-
| end of split 41 /113 | epoch 1 | time: 232.96s | valid loss 7.0886 | valid ppl 1198.2460 | learning rate 5.0000
|
42 |
-
| end of split 42 /113 | epoch 1 | time: 233.86s | valid loss 7.1457 | valid ppl 1268.6031 | learning rate 5.0000
|
43 |
-
| end of split 43 /113 | epoch 1 | time: 234.62s | valid loss 7.1386 | valid ppl 1259.6532 | learning rate 5.0000
|
44 |
-
| end of split 44 /113 | epoch 1 | time: 232.69s | valid loss 7.0900 | valid ppl 1199.9118 | learning rate 5.0000
|
45 |
-
| end of split 45 /113 | epoch 1 | time: 230.84s | valid loss 7.1523 | valid ppl 1276.9780 | learning rate 5.0000
|
46 |
-
| end of split 46 /113 | epoch 1 | time: 231.71s | valid loss 7.1219 | valid ppl 1238.7760 | learning rate 5.0000
|
47 |
-
| end of split 47 /113 | epoch 1 | time: 230.86s | valid loss 7.0811 | valid ppl 1189.2806 | learning rate 5.0000
|
48 |
-
| end of split 48 /113 | epoch 1 | time: 232.63s | valid loss 7.1543 | valid ppl 1279.6527 | learning rate 5.0000
|
49 |
-
| end of split 49 /113 | epoch 1 | time: 233.86s | valid loss 7.0683 | valid ppl 1174.0986 | learning rate 5.0000
|
50 |
-
| end of split 50 /113 | epoch 1 | time: 229.15s | valid loss 7.0550 | valid ppl 1158.6403 | learning rate 5.0000
|
51 |
-
| end of split 51 /113 | epoch 1 | time: 236.63s | valid loss 7.1117 | valid ppl 1226.2546 | learning rate 5.0000
|
52 |
-
| end of split 52 /113 | epoch 1 | time: 238.10s | valid loss 7.1026 | valid ppl 1215.1584 | learning rate 5.0000
|
53 |
-
| end of split 53 /113 | epoch 1 | time: 232.74s | valid loss 7.0969 | valid ppl 1208.2648 | learning rate 5.0000
|
54 |
-
| end of split 54 /113 | epoch 1 | time: 238.09s | valid loss 7.0846 | valid ppl 1193.4612 | learning rate 5.0000
|
55 |
-
| end of split 55 /113 | epoch 1 | time: 233.70s | valid loss 7.1157 | valid ppl 1231.1284 | learning rate 5.0000
|
56 |
-
| end of split 56 /113 | epoch 1 | time: 230.09s | valid loss 7.0540 | valid ppl 1157.4801 | learning rate 5.0000
|
57 |
-
| end of split 57 /113 | epoch 1 | time: 235.27s | valid loss 7.0783 | valid ppl 1185.9658 | learning rate 5.0000
|
58 |
-
| end of split 58 /113 | epoch 1 | time: 233.74s | valid loss 7.1189 | valid ppl 1235.0774 | learning rate 5.0000
|
59 |
-
| end of split 59 /113 | epoch 1 | time: 229.77s | valid loss 7.0364 | valid ppl 1137.2668 | learning rate 5.0000
|
60 |
-
| end of split 60 /113 | epoch 1 | time: 233.24s | valid loss 7.0514 | valid ppl 1154.5030 | learning rate 5.0000
|
61 |
-
| end of split 61 /113 | epoch 1 | time: 236.63s | valid loss 7.1055 | valid ppl 1218.6020 | learning rate 5.0000
|
62 |
-
| end of split 62 /113 | epoch 1 | time: 233.17s | valid loss 7.1210 | valid ppl 1237.6443 | learning rate 5.0000
|
63 |
-
| end of split 63 /113 | epoch 1 | time: 234.66s | valid loss 7.0762 | valid ppl 1183.4137 | learning rate 5.0000
|
64 |
-
| end of split 64 /113 | epoch 1 | time: 232.58s | valid loss 7.1240 | valid ppl 1241.4370 | learning rate 5.0000
|
65 |
-
| end of split 65 /113 | epoch 1 | time: 231.51s | valid loss 7.0930 | valid ppl 1203.5000 | learning rate 5.0000
|
66 |
-
| end of split 66 /113 | epoch 1 | time: 232.26s | valid loss 7.1001 | valid ppl 1212.0637 | learning rate 5.0000
|
67 |
-
| end of split 67 /113 | epoch 1 | time: 228.92s | valid loss 7.0738 | valid ppl 1180.6015 | learning rate 5.0000
|
68 |
-
| end of split 68 /113 | epoch 1 | time: 230.60s | valid loss 7.1206 | valid ppl 1237.2528 | learning rate 5.0000
|
69 |
-
| end of split 69 /113 | epoch 1 | time: 232.29s | valid loss 7.1268 | valid ppl 1244.8903 | learning rate 5.0000
|
70 |
-
| end of split 70 /113 | epoch 1 | time: 234.60s | valid loss 7.1138 | valid ppl 1228.8092 | learning rate 5.0000
|
71 |
-
| end of split 71 /113 | epoch 1 | time: 231.33s | valid loss 7.0736 | valid ppl 1180.4231 | learning rate 5.0000
|
72 |
-
| end of split 72 /113 | epoch 1 | time: 235.50s | valid loss 7.0407 | valid ppl 1142.1916 | learning rate 5.0000
|
73 |
-
| end of split 73 /113 | epoch 1 | time: 230.23s | valid loss 7.0512 | valid ppl 1154.2604 | learning rate 5.0000
|
74 |
-
| end of split 74 /113 | epoch 1 | time: 239.00s | valid loss 7.1215 | valid ppl 1238.2501 | learning rate 5.0000
|
75 |
-
| end of split 75 /113 | epoch 1 | time: 234.03s | valid loss 7.1852 | valid ppl 1319.7906 | learning rate 5.0000
|
76 |
-
| end of split 76 /113 | epoch 1 | time: 234.28s | valid loss 7.0916 | valid ppl 1201.8453 | learning rate 5.0000
|
77 |
-
| end of split 77 /113 | epoch 1 | time: 235.71s | valid loss 7.0874 | valid ppl 1196.7356 | learning rate 5.0000
|
78 |
-
| end of split 78 /113 | epoch 1 | time: 237.06s | valid loss 7.1335 | valid ppl 1253.2911 | learning rate 5.0000
|
79 |
-
| end of split 79 /113 | epoch 1 | time: 233.74s | valid loss 7.1122 | valid ppl 1226.8927 | learning rate 5.0000
|
80 |
-
| end of split 80 /113 | epoch 1 | time: 233.17s | valid loss 7.1309 | valid ppl 1250.0614 | learning rate 5.0000
|
81 |
-
| end of split 81 /113 | epoch 1 | time: 232.30s | valid loss 7.0873 | valid ppl 1196.7297 | learning rate 5.0000
|
82 |
-
| end of split 82 /113 | epoch 1 | time: 231.22s | valid loss 7.1370 | valid ppl 1257.6055 | learning rate 5.0000
|
83 |
-
| end of split 83 /113 | epoch 1 | time: 231.43s | valid loss 7.0576 | valid ppl 1161.6918 | learning rate 5.0000
|
84 |
-
| end of split 84 /113 | epoch 1 | time: 235.02s | valid loss 7.0657 | valid ppl 1171.0550 | learning rate 5.0000
|
85 |
-
| end of split 85 /113 | epoch 1 | time: 234.79s | valid loss 7.1117 | valid ppl 1226.2184 | learning rate 5.0000
|
86 |
-
| end of split 86 /113 | epoch 1 | time: 239.30s | valid loss 7.0911 | valid ppl 1201.2320 | learning rate 5.0000
|
87 |
-
| end of split 87 /113 | epoch 1 | time: 230.62s | valid loss 7.0994 | valid ppl 1211.2212 | learning rate 5.0000
|
88 |
-
| end of split 88 /113 | epoch 1 | time: 231.93s | valid loss 7.1275 | valid ppl 1245.7974 | learning rate 5.0000
|
89 |
-
| end of split 89 /113 | epoch 1 | time: 231.13s | valid loss 7.0923 | valid ppl 1202.6127 | learning rate 5.0000
|
90 |
-
| end of split 90 /113 | epoch 1 | time: 236.74s | valid loss 7.1520 | valid ppl 1276.6935 | learning rate 5.0000
|
91 |
-
| end of split 91 /113 | epoch 1 | time: 232.98s | valid loss 7.1159 | valid ppl 1231.3526 | learning rate 5.0000
|
92 |
-
| end of split 92 /113 | epoch 1 | time: 236.25s | valid loss 7.1405 | valid ppl 1262.0972 | learning rate 5.0000
|
93 |
-
| end of split 93 /113 | epoch 1 | time: 234.62s | valid loss 7.0885 | valid ppl 1198.1424 | learning rate 5.0000
|
94 |
-
| end of split 94 /113 | epoch 1 | time: 233.59s | valid loss 7.1003 | valid ppl 1212.3560 | learning rate 5.0000
|
95 |
-
| end of split 95 /113 | epoch 1 | time: 233.27s | valid loss 7.1059 | valid ppl 1219.0888 | learning rate 5.0000
|
96 |
-
| end of split 96 /113 | epoch 1 | time: 231.78s | valid loss 7.1232 | valid ppl 1240.4668 | learning rate 5.0000
|
97 |
-
| end of split 97 /113 | epoch 1 | time: 235.60s | valid loss 7.1186 | valid ppl 1234.7345 | learning rate 5.0000
|
98 |
-
| end of split 98 /113 | epoch 1 | time: 233.88s | valid loss 7.1161 | valid ppl 1231.6487 | learning rate 5.0000
|
99 |
-
| end of split 99 /113 | epoch 1 | time: 236.68s | valid loss 7.1076 | valid ppl 1221.1639 | learning rate 5.0000
|
100 |
-
| end of split 100 /113 | epoch 1 | time: 232.62s | valid loss 7.0984 | valid ppl 1210.0832 | learning rate 5.0000
|
101 |
-
| end of split 101 /113 | epoch 1 | time: 233.49s | valid loss 7.1288 | valid ppl 1247.4030 | learning rate 5.0000
|
102 |
-
| end of split 102 /113 | epoch 1 | time: 232.34s | valid loss 7.0934 | valid ppl 1204.0527 | learning rate 5.0000
|
103 |
-
| end of split 103 /113 | epoch 1 | time: 230.64s | valid loss 7.1062 | valid ppl 1219.4642 | learning rate 5.0000
|
104 |
-
| end of split 104 /113 | epoch 1 | time: 235.83s | valid loss 7.1531 | valid ppl 1278.0091 | learning rate 5.0000
|
105 |
-
| end of split 105 /113 | epoch 1 | time: 230.35s | valid loss 7.1200 | valid ppl 1236.4884 | learning rate 5.0000
|
106 |
-
| end of split 106 /113 | epoch 1 | time: 231.68s | valid loss 7.1236 | valid ppl 1240.9623 | learning rate 5.0000
|
107 |
-
| end of split 107 /113 | epoch 1 | time: 236.04s | valid loss 7.0998 | valid ppl 1211.7024 | learning rate 5.0000
|
108 |
-
| end of split 108 /113 | epoch 1 | time: 231.16s | valid loss 7.1267 | valid ppl 1244.7170 | learning rate 5.0000
|
109 |
-
| end of split 109 /113 | epoch 1 | time: 235.80s | valid loss 7.1114 | valid ppl 1225.8615 | learning rate 5.0000
|
110 |
-
| end of split 110 /113 | epoch 1 | time: 229.11s | valid loss 7.0848 | valid ppl 1193.6844 | learning rate 5.0000
|
111 |
-
| end of split 111 /113 | epoch 1 | time: 232.32s | valid loss 7.0782 | valid ppl 1185.7957 | learning rate 1.2500
|
112 |
-
| end of split 112 /113 | epoch 1 | time: 232.60s | valid loss 7.0965 | valid ppl 1207.7586 | learning rate 1.2500
|
113 |
-
| end of split 113 /113 | epoch 1 | time: 237.25s | valid loss 7.1007 | valid ppl 1212.7755 | learning rate 1.2500
|
114 |
-
| end of split 1 /113 | epoch 2 | time: 229.76s | valid loss 7.0779 | valid ppl 1185.4298 | learning rate 1.2500
|
115 |
-
| end of split 2 /113 | epoch 2 | time: 232.20s | valid loss 7.0994 | valid ppl 1211.1846 | learning rate 1.2500
|
116 |
-
| end of split 3 /113 | epoch 2 | time: 230.39s | valid loss 7.0802 | valid ppl 1188.2092 | learning rate 1.2500
|
117 |
-
| end of split 4 /113 | epoch 2 | time: 232.46s | valid loss 7.0951 | valid ppl 1205.9962 | learning rate 1.2500
|
118 |
-
| end of split 5 /113 | epoch 2 | time: 232.66s | valid loss 7.1047 | valid ppl 1217.6557 | learning rate 1.2500
|
119 |
-
| end of split 6 /113 | epoch 2 | time: 231.54s | valid loss 7.0950 | valid ppl 1205.9267 | learning rate 1.2500
|
120 |
-
| end of split 7 /113 | epoch 2 | time: 234.75s | valid loss 7.1142 | valid ppl 1229.3492 | learning rate 1.2500
|
121 |
-
| end of split 8 /113 | epoch 2 | time: 235.30s | valid loss 7.0901 | valid ppl 1200.0375 | learning rate 1.2500
|
122 |
-
| end of split 9 /113 | epoch 2 | time: 235.81s | valid loss 7.0971 | valid ppl 1208.4907 | learning rate 1.2500
|
123 |
-
| end of split 10 /113 | epoch 2 | time: 230.40s | valid loss 7.0927 | valid ppl 1203.1642 | learning rate 1.2500
|
124 |
-
| end of split 11 /113 | epoch 2 | time: 235.86s | valid loss 7.1028 | valid ppl 1215.3789 | learning rate 1.2500
|
125 |
-
| end of split 12 /113 | epoch 2 | time: 230.91s | valid loss 7.0949 | valid ppl 1205.7953 | learning rate 1.2500
|
126 |
-
| end of split 13 /113 | epoch 2 | time: 233.88s | valid loss 7.0789 | valid ppl 1186.6439 | learning rate 1.2500
|
127 |
-
| end of split 14 /113 | epoch 2 | time: 232.71s | valid loss 7.0946 | valid ppl 1205.4994 | learning rate 1.2500
|
128 |
-
| end of split 15 /113 | epoch 2 | time: 230.99s | valid loss 7.0850 | valid ppl 1193.9639 | learning rate 1.2500
|
129 |
-
| end of split 16 /113 | epoch 2 | time: 227.77s | valid loss 7.1121 | valid ppl 1226.6969 | learning rate 1.2500
|
130 |
-
| end of split 17 /113 | epoch 2 | time: 235.85s | valid loss 7.0980 | valid ppl 1209.5941 | learning rate 1.2500
|
131 |
-
| end of split 18 /113 | epoch 2 | time: 235.06s | valid loss 7.0815 | valid ppl 1189.7783 | learning rate 1.2500
|
132 |
-
| end of split 19 /113 | epoch 2 | time: 237.29s | valid loss 7.1028 | valid ppl 1215.3490 | learning rate 1.2500
|
133 |
-
| end of split 20 /113 | epoch 2 | time: 235.29s | valid loss 7.0942 | valid ppl 1204.9817 | learning rate 1.2500
|
134 |
-
| end of split 21 /113 | epoch 2 | time: 231.22s | valid loss 7.0837 | valid ppl 1192.3273 | learning rate 1.2500
|
135 |
-
| end of split 22 /113 | epoch 2 | time: 235.58s | valid loss 7.0989 | valid ppl 1210.6321 | learning rate 1.2500
|
136 |
-
| end of split 23 /113 | epoch 2 | time: 232.62s | valid loss 7.0947 | valid ppl 1205.5749 | learning rate 1.2500
|
137 |
-
| end of split 24 /113 | epoch 2 | time: 238.49s | valid loss 7.1007 | valid ppl 1212.8266 | learning rate 1.2500
|
138 |
-
| end of split 25 /113 | epoch 2 | time: 228.89s | valid loss 7.0794 | valid ppl 1187.2814 | learning rate 1.2500
|
139 |
-
| end of split 26 /113 | epoch 2 | time: 231.21s | valid loss 7.0910 | valid ppl 1201.0850 | learning rate 1.2500
|
140 |
-
| end of split 27 /113 | epoch 2 | time: 236.23s | valid loss 7.0950 | valid ppl 1205.9267 | learning rate 1.2500
|
141 |
-
| end of split 28 /113 | epoch 2 | time: 234.70s | valid loss 7.0858 | valid ppl 1194.8918 | learning rate 1.2500
|
142 |
-
| end of split 29 /113 | epoch 2 | time: 229.67s | valid loss 7.0637 | valid ppl 1168.7198 | learning rate 1.2500
|
143 |
-
| end of split 30 /113 | epoch 2 | time: 230.59s | valid loss 7.1101 | valid ppl 1224.2250 | learning rate 1.2500
|
144 |
-
| end of split 31 /113 | epoch 2 | time: 232.68s | valid loss 7.0836 | valid ppl 1192.2460 | learning rate 1.2500
|
145 |
-
| end of split 32 /113 | epoch 2 | time: 231.80s | valid loss 7.1094 | valid ppl 1223.3879 | learning rate 1.2500
|
146 |
-
| end of split 33 /113 | epoch 2 | time: 234.73s | valid loss 7.1026 | valid ppl 1215.0679 | learning rate 1.2500
|
147 |
-
| end of split 34 /113 | epoch 2 | time: 232.94s | valid loss 7.0845 | valid ppl 1193.3580 | learning rate 1.2500
|
148 |
-
| end of split 35 /113 | epoch 2 | time: 232.85s | valid loss 7.1046 | valid ppl 1217.5067 | learning rate 1.2500
|
149 |
-
| end of split 36 /113 | epoch 2 | time: 236.10s | valid loss 7.1064 | valid ppl 1219.7146 | learning rate 1.2500
|
150 |
-
| end of split 37 /113 | epoch 2 | time: 234.89s | valid loss 7.0999 | valid ppl 1211.8541 | learning rate 1.2500
|
151 |
-
| end of split 38 /113 | epoch 2 | time: 239.33s | valid loss 7.0895 | valid ppl 1199.2961 | learning rate 1.2500
|
152 |
-
| end of split 39 /113 | epoch 2 | time: 239.01s | valid loss 7.1112 | valid ppl 1225.6211 | learning rate 1.2500
|
153 |
-
| end of split 40 /113 | epoch 2 | time: 233.50s | valid loss 7.0895 | valid ppl 1199.3484 | learning rate 1.2500
|
154 |
-
| end of split 41 /113 | epoch 2 | time: 237.27s | valid loss 7.0723 | valid ppl 1178.8008 | learning rate 1.2500
|
155 |
-
| end of split 42 /113 | epoch 2 | time: 231.15s | valid loss 7.0958 | valid ppl 1206.8495 | learning rate 1.2500
|
156 |
-
| end of split 43 /113 | epoch 2 | time: 231.39s | valid loss 7.0922 | valid ppl 1202.5908 | learning rate 1.2500
|
157 |
-
| end of split 44 /113 | epoch 2 | time: 229.96s | valid loss 7.1024 | valid ppl 1214.8449 | learning rate 1.2500
|
158 |
-
| end of split 45 /113 | epoch 2 | time: 237.25s | valid loss 7.1115 | valid ppl 1226.0123 | learning rate 1.2500
|
159 |
-
| end of split 46 /113 | epoch 2 | time: 233.19s | valid loss 7.0828 | valid ppl 1191.2430 | learning rate 1.2500
|
160 |
-
| end of split 47 /113 | epoch 2 | time: 232.26s | valid loss 7.0917 | valid ppl 1201.9762 | learning rate 1.2500
|
161 |
-
| end of split 48 /113 | epoch 2 | time: 227.95s | valid loss 7.0983 | valid ppl 1209.8765 | learning rate 1.2500
|
162 |
-
| end of split 49 /113 | epoch 2 | time: 232.30s | valid loss 7.0888 | valid ppl 1198.4128 | learning rate 0.3125
|
163 |
-
| end of split 50 /113 | epoch 2 | time: 238.16s | valid loss 7.0910 | valid ppl 1201.0504 | learning rate 0.3125
|
164 |
-
| end of split 51 /113 | epoch 2 | time: 233.23s | valid loss 7.0949 | valid ppl 1205.7495 | learning rate 0.3125
|
165 |
-
| end of split 52 /113 | epoch 2 | time: 232.61s | valid loss 7.0807 | valid ppl 1188.8117 | learning rate 0.3125
|
166 |
-
| end of split 53 /113 | epoch 2 | time: 233.73s | valid loss 7.0902 | valid ppl 1200.1734 | learning rate 0.3125
|
167 |
-
| end of split 54 /113 | epoch 2 | time: 230.67s | valid loss 7.0855 | valid ppl 1194.5399 | learning rate 0.3125
|
168 |
-
| end of split 55 /113 | epoch 2 | time: 235.17s | valid loss 7.0903 | valid ppl 1200.2645 | learning rate 0.3125
|
169 |
-
| end of split 56 /113 | epoch 2 | time: 230.04s | valid loss 7.0905 | valid ppl 1200.5506 | learning rate 0.3125
|
170 |
-
| end of split 57 /113 | epoch 2 | time: 235.80s | valid loss 7.0972 | valid ppl 1208.5664 | learning rate 0.3125
|
171 |
-
| end of split 58 /113 | epoch 2 | time: 233.83s | valid loss 7.0926 | valid ppl 1203.0872 | learning rate 0.3125
|
172 |
-
| end of split 59 /113 | epoch 2 | time: 234.66s | valid loss 7.0922 | valid ppl 1202.5223 | learning rate 0.3125
|
173 |
-
| end of split 60 /113 | epoch 2 | time: 231.74s | valid loss 7.0899 | valid ppl 1199.8190 | learning rate 0.3125
|
174 |
-
| end of split 61 /113 | epoch 2 | time: 228.91s | valid loss 7.0938 | valid ppl 1204.4743 | learning rate 0.3125
|
175 |
-
| end of split 62 /113 | epoch 2 | time: 235.87s | valid loss 7.0887 | valid ppl 1198.3909 | learning rate 0.3125
|
176 |
-
| end of split 63 /113 | epoch 2 | time: 234.42s | valid loss 7.0820 | valid ppl 1190.2886 | learning rate 0.3125
|
177 |
-
| end of split 64 /113 | epoch 2 | time: 233.77s | valid loss 7.0910 | valid ppl 1201.1087 | learning rate 0.3125
|
178 |
-
| end of split 65 /113 | epoch 2 | time: 235.55s | valid loss 7.0922 | valid ppl 1202.4961 | learning rate 0.3125
|
179 |
-
| end of split 66 /113 | epoch 2 | time: 231.77s | valid loss 7.0890 | valid ppl 1198.6597 | learning rate 0.3125
|
180 |
-
| end of split 67 /113 | epoch 2 | time: 239.03s | valid loss 7.0907 | valid ppl 1200.6899 | learning rate 0.3125
|
181 |
-
| end of split 68 /113 | epoch 2 | time: 233.79s | valid loss 7.0929 | valid ppl 1203.3503 | learning rate 0.3125
|
182 |
-
| end of split 69 /113 | epoch 2 | time: 230.34s | valid loss 7.0980 | valid ppl 1209.6052 | learning rate 0.3125
|
183 |
-
| end of split 70 /113 | epoch 2 | time: 236.49s | valid loss 7.0882 | valid ppl 1197.7819 | learning rate 0.3125
|
184 |
-
| end of split 71 /113 | epoch 2 | time: 234.44s | valid loss 7.1003 | valid ppl 1212.3714 | learning rate 0.3125
|
185 |
-
| end of split 72 /113 | epoch 2 | time: 233.01s | valid loss 7.0828 | valid ppl 1191.3159 | learning rate 0.3125
|
186 |
-
| end of split 73 /113 | epoch 2 | time: 238.78s | valid loss 7.0959 | valid ppl 1207.0328 | learning rate 0.3125
|
187 |
-
| end of split 74 /113 | epoch 2 | time: 239.67s | valid loss 7.0914 | valid ppl 1201.5850 | learning rate 0.3125
|
188 |
-
| end of split 75 /113 | epoch 2 | time: 230.83s | valid loss 7.1005 | valid ppl 1212.5495 | learning rate 0.3125
|
189 |
-
| end of split 76 /113 | epoch 2 | time: 235.05s | valid loss 7.0889 | valid ppl 1198.6319 | learning rate 0.3125
|
190 |
-
| end of split 77 /113 | epoch 2 | time: 230.27s | valid loss 7.0923 | valid ppl 1202.6914 | learning rate 0.3125
|
191 |
-
| end of split 78 /113 | epoch 2 | time: 231.51s | valid loss 7.0787 | valid ppl 1186.4144 | learning rate 0.3125
|
192 |
-
| end of split 79 /113 | epoch 2 | time: 232.70s | valid loss 7.0995 | valid ppl 1211.3830 | learning rate 0.3125
|
193 |
-
| end of split 80 /113 | epoch 2 | time: 233.21s | valid loss 7.0929 | valid ppl 1203.3740 | learning rate 0.3125
|
194 |
-
| end of split 81 /113 | epoch 2 | time: 230.05s | valid loss 7.0802 | valid ppl 1188.1591 | learning rate 0.3125
|
195 |
-
| end of split 82 /113 | epoch 2 | time: 235.62s | valid loss 7.0860 | valid ppl 1195.0842 | learning rate 0.3125
|
196 |
-
| end of split 83 /113 | epoch 2 | time: 236.11s | valid loss 7.0906 | valid ppl 1200.6764 | learning rate 0.3125
|
197 |
-
| end of split 84 /113 | epoch 2 | time: 230.87s | valid loss 7.0850 | valid ppl 1193.9009 | learning rate 0.3125
|
198 |
-
| end of split 85 /113 | epoch 2 | time: 232.62s | valid loss 7.0939 | valid ppl 1204.6437 | learning rate 0.3125
|
199 |
-
| end of split 86 /113 | epoch 2 | time: 238.23s | valid loss 7.0856 | valid ppl 1194.6482 | learning rate 0.3125
|
200 |
-
| end of split 87 /113 | epoch 2 | time: 233.77s | valid loss 7.0942 | valid ppl 1205.0113 | learning rate 0.3125
|
201 |
-
| end of split 88 /113 | epoch 2 | time: 230.52s | valid loss 7.0954 | valid ppl 1206.3736 | learning rate 0.3125
|
202 |
-
| end of split 89 /113 | epoch 2 | time: 235.21s | valid loss 7.0953 | valid ppl 1206.2616 | learning rate 0.3125
|
203 |
-
| end of split 90 /113 | epoch 2 | time: 236.74s | valid loss 7.0902 | valid ppl 1200.1371 | learning rate 0.3125
|
204 |
-
| end of split 91 /113 | epoch 2 | time: 234.19s | valid loss 7.0940 | valid ppl 1204.7284 | learning rate 0.3125
|
205 |
-
| end of split 92 /113 | epoch 2 | time: 229.17s | valid loss 7.0667 | valid ppl 1172.2181 | learning rate 0.3125
|
206 |
-
| end of split 93 /113 | epoch 2 | time: 233.18s | valid loss 7.0851 | valid ppl 1193.9966 | learning rate 0.3125
|
207 |
-
| end of split 94 /113 | epoch 2 | time: 233.54s | valid loss 7.0983 | valid ppl 1209.8629 | learning rate 0.3125
|
208 |
-
| end of split 95 /113 | epoch 2 | time: 240.46s | valid loss 7.0915 | valid ppl 1201.7565 | learning rate 0.3125
|
209 |
-
| end of split 96 /113 | epoch 2 | time: 232.63s | valid loss 7.0925 | valid ppl 1202.8766 | learning rate 0.3125
|
210 |
-
| end of split 97 /113 | epoch 2 | time: 236.79s | valid loss 7.0868 | valid ppl 1196.0248 | learning rate 0.3125
|
211 |
-
| end of split 98 /113 | epoch 2 | time: 234.71s | valid loss 7.0826 | valid ppl 1191.0655 | learning rate 0.3125
|
212 |
-
| end of split 99 /113 | epoch 2 | time: 233.29s | valid loss 7.0957 | valid ppl 1206.8113 | learning rate 0.3125
|
213 |
-
| end of split 100 /113 | epoch 2 | time: 236.83s | valid loss 7.0924 | valid ppl 1202.8005 | learning rate 0.0781
|
214 |
-
| end of split 101 /113 | epoch 2 | time: 48.85s | valid loss 7.0897 | valid ppl 1199.5980 | learning rate 0.0781
|
215 |
-
| end of split 102 /113 | epoch 2 | time: 236.70s | valid loss 7.0890 | valid ppl 1198.7280 | learning rate 0.0781
|
216 |
-
| end of split 103 /113 | epoch 2 | time: 238.79s | valid loss 7.0864 | valid ppl 1195.5683 | learning rate 0.0781
|
217 |
-
| end of split 104 /113 | epoch 2 | time: 232.38s | valid loss 7.0929 | valid ppl 1203.4357 | learning rate 0.0781
|
218 |
-
| end of split 105 /113 | epoch 2 | time: 229.19s | valid loss 7.0942 | valid ppl 1204.8987 | learning rate 0.0781
|
219 |
-
| end of split 106 /113 | epoch 2 | time: 231.16s | valid loss 7.0949 | valid ppl 1205.8207 | learning rate 0.0781
|
220 |
-
| end of split 107 /113 | epoch 2 | time: 232.93s | valid loss 7.0896 | valid ppl 1199.3762 | learning rate 0.0781
|
221 |
-
| end of split 108 /113 | epoch 2 | time: 234.06s | valid loss 7.0961 | valid ppl 1207.2101 | learning rate 0.0781
|
222 |
-
| end of split 109 /113 | epoch 2 | time: 233.27s | valid loss 7.0883 | valid ppl 1197.8653 | learning rate 0.0781
|
223 |
-
| end of split 110 /113 | epoch 2 | time: 234.69s | valid loss 7.0930 | valid ppl 1203.4772 | learning rate 0.0781
|
224 |
-
| end of split 111 /113 | epoch 2 | time: 231.50s | valid loss 7.0946 | valid ppl 1205.4435 | learning rate 0.0781
|
225 |
-
| end of split 112 /113 | epoch 2 | time: 233.79s | valid loss 7.0864 | valid ppl 1195.5549 | learning rate 0.0781
|
226 |
-
| end of split 113 /113 | epoch 2 | time: 232.14s | valid loss 7.0906 | valid ppl 1200.6055 | learning rate 0.0781
|
227 |
-
TEST: valid loss 7.0908 | valid ppl 1200.8965
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
training.log
DELETED
The diff for this file is too large to render.
See raw diff
|
|