DGurgurov commited on
Commit
9371975
1 Parent(s): 39f6c0d

Upload 18 files

Browse files
README.md CHANGED
@@ -1,3 +1,160 @@
1
  ---
2
- license: mit
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ base_model: bert-base-multilingual-cased
4
+ tags:
5
+ - generated_from_trainer
6
+ metrics:
7
+ - accuracy
8
+ model-index:
9
+ - name: mt
10
+ results: []
11
  ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # mt
17
+
18
+ This model is a fine-tuned version of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) on an unknown dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.8117
21
+ - Accuracy: 0.8590
22
+
23
+ ## Model description
24
+
25
+ More information needed
26
+
27
+ ## Intended uses & limitations
28
+
29
+ More information needed
30
+
31
+ ## Training and evaluation data
32
+
33
+ More information needed
34
+
35
+ ## Training procedure
36
+
37
+ ### Training hyperparameters
38
+
39
+ The following hyperparameters were used during training:
40
+ - learning_rate: 5e-05
41
+ - train_batch_size: 16
42
+ - eval_batch_size: 16
43
+ - seed: 42
44
+ - distributed_type: multi-GPU
45
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
+ - lr_scheduler_type: linear
47
+ - training_steps: 50000
48
+
49
+ ### Training results
50
+
51
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy |
52
+ |:-------------:|:------:|:-----:|:---------------:|:--------:|
53
+ | 2.6689 | 1.04 | 500 | 2.1345 | 0.6677 |
54
+ | 2.1415 | 2.07 | 1000 | 1.8436 | 0.6926 |
55
+ | 1.9421 | 3.11 | 1500 | 1.7874 | 0.6907 |
56
+ | 1.7588 | 4.14 | 2000 | 1.7605 | 0.7013 |
57
+ | 1.6729 | 5.18 | 2500 | 1.7568 | 0.6957 |
58
+ | 1.596 | 6.21 | 3000 | 1.5006 | 0.7273 |
59
+ | 1.5778 | 7.25 | 3500 | 1.3924 | 0.7451 |
60
+ | 1.4821 | 8.28 | 4000 | 1.6097 | 0.7099 |
61
+ | 1.4183 | 9.32 | 4500 | 1.3552 | 0.7491 |
62
+ | 1.4197 | 10.35 | 5000 | 1.2847 | 0.7513 |
63
+ | 1.3156 | 11.39 | 5500 | 1.3173 | 0.7496 |
64
+ | 1.2882 | 12.42 | 6000 | 1.2817 | 0.7738 |
65
+ | 1.2692 | 13.46 | 6500 | 1.1892 | 0.7751 |
66
+ | 1.2368 | 14.49 | 7000 | 1.2363 | 0.7816 |
67
+ | 1.1975 | 15.53 | 7500 | 1.2442 | 0.7700 |
68
+ | 1.1907 | 16.56 | 8000 | 1.2569 | 0.7720 |
69
+ | 1.1231 | 17.6 | 8500 | 1.1386 | 0.7761 |
70
+ | 1.0873 | 18.63 | 9000 | 1.2105 | 0.7856 |
71
+ | 1.1242 | 19.67 | 9500 | 1.2142 | 0.7738 |
72
+ | 1.0367 | 20.7 | 10000 | 1.2121 | 0.7712 |
73
+ | 1.0869 | 21.74 | 10500 | 1.0782 | 0.7955 |
74
+ | 1.0353 | 22.77 | 11000 | 0.9918 | 0.8069 |
75
+ | 1.0324 | 23.81 | 11500 | 1.0908 | 0.7971 |
76
+ | 1.0145 | 24.84 | 12000 | 1.0945 | 0.7975 |
77
+ | 0.9951 | 25.88 | 12500 | 1.0005 | 0.8028 |
78
+ | 0.9483 | 26.92 | 13000 | 0.9638 | 0.8187 |
79
+ | 0.9304 | 27.95 | 13500 | 0.9761 | 0.8205 |
80
+ | 0.8835 | 28.99 | 14000 | 1.0620 | 0.8046 |
81
+ | 0.9097 | 30.02 | 14500 | 0.9138 | 0.8060 |
82
+ | 0.9293 | 31.06 | 15000 | 0.9180 | 0.8176 |
83
+ | 0.9043 | 32.09 | 15500 | 0.9215 | 0.8208 |
84
+ | 0.8581 | 33.13 | 16000 | 0.9625 | 0.8225 |
85
+ | 0.8638 | 34.16 | 16500 | 0.8586 | 0.8368 |
86
+ | 0.874 | 35.2 | 17000 | 1.0044 | 0.8135 |
87
+ | 0.8235 | 36.23 | 17500 | 0.9755 | 0.8184 |
88
+ | 0.8589 | 37.27 | 18000 | 0.9042 | 0.8292 |
89
+ | 0.8107 | 38.3 | 18500 | 0.8821 | 0.8272 |
90
+ | 0.8346 | 39.34 | 19000 | 0.9061 | 0.8248 |
91
+ | 0.8393 | 40.37 | 19500 | 0.9796 | 0.8235 |
92
+ | 0.789 | 41.41 | 20000 | 0.9015 | 0.8331 |
93
+ | 0.8121 | 42.44 | 20500 | 0.8589 | 0.8386 |
94
+ | 0.7709 | 43.48 | 21000 | 0.8836 | 0.8351 |
95
+ | 0.7922 | 44.51 | 21500 | 0.9524 | 0.8180 |
96
+ | 0.7457 | 45.55 | 22000 | 0.8350 | 0.8364 |
97
+ | 0.7386 | 46.58 | 22500 | 0.9025 | 0.8341 |
98
+ | 0.7515 | 47.62 | 23000 | 0.9092 | 0.8390 |
99
+ | 0.7324 | 48.65 | 23500 | 0.8322 | 0.8421 |
100
+ | 0.7314 | 49.69 | 24000 | 0.7968 | 0.8477 |
101
+ | 0.7442 | 50.72 | 24500 | 0.9305 | 0.8324 |
102
+ | 0.7074 | 51.76 | 25000 | 1.0011 | 0.8208 |
103
+ | 0.739 | 52.8 | 25500 | 0.8732 | 0.8331 |
104
+ | 0.7243 | 53.83 | 26000 | 0.7857 | 0.8480 |
105
+ | 0.6842 | 54.87 | 26500 | 0.7945 | 0.8377 |
106
+ | 0.6991 | 55.9 | 27000 | 0.9628 | 0.8275 |
107
+ | 0.6896 | 56.94 | 27500 | 0.8363 | 0.8410 |
108
+ | 0.6925 | 57.97 | 28000 | 0.8433 | 0.8392 |
109
+ | 0.7081 | 59.01 | 28500 | 1.0086 | 0.8223 |
110
+ | 0.6598 | 60.04 | 29000 | 0.9251 | 0.8333 |
111
+ | 0.6677 | 61.08 | 29500 | 0.8823 | 0.8437 |
112
+ | 0.695 | 62.11 | 30000 | 0.7751 | 0.8560 |
113
+ | 0.7108 | 63.15 | 30500 | 0.8452 | 0.8481 |
114
+ | 0.6721 | 64.18 | 31000 | 0.8560 | 0.8413 |
115
+ | 0.6571 | 65.22 | 31500 | 0.9800 | 0.8163 |
116
+ | 0.6891 | 66.25 | 32000 | 0.8106 | 0.8457 |
117
+ | 0.6541 | 67.29 | 32500 | 0.8197 | 0.8430 |
118
+ | 0.6559 | 68.32 | 33000 | 0.8678 | 0.8388 |
119
+ | 0.6554 | 69.36 | 33500 | 0.7396 | 0.8662 |
120
+ | 0.618 | 70.39 | 34000 | 0.8518 | 0.8376 |
121
+ | 0.6558 | 71.43 | 34500 | 0.7706 | 0.8409 |
122
+ | 0.6034 | 72.46 | 35000 | 0.7829 | 0.8518 |
123
+ | 0.6336 | 73.5 | 35500 | 0.7835 | 0.8591 |
124
+ | 0.6287 | 74.53 | 36000 | 0.7548 | 0.8575 |
125
+ | 0.6065 | 75.57 | 36500 | 0.8542 | 0.8508 |
126
+ | 0.6029 | 76.6 | 37000 | 0.8203 | 0.8405 |
127
+ | 0.6208 | 77.64 | 37500 | 0.7082 | 0.8661 |
128
+ | 0.64 | 78.67 | 38000 | 0.8505 | 0.8410 |
129
+ | 0.6144 | 79.71 | 38500 | 0.7246 | 0.8604 |
130
+ | 0.6507 | 80.75 | 39000 | 0.7150 | 0.8611 |
131
+ | 0.6177 | 81.78 | 39500 | 0.9332 | 0.84 |
132
+ | 0.6159 | 82.82 | 40000 | 0.6427 | 0.8733 |
133
+ | 0.5944 | 83.85 | 40500 | 0.7721 | 0.8411 |
134
+ | 0.6044 | 84.89 | 41000 | 0.8968 | 0.8449 |
135
+ | 0.6 | 85.92 | 41500 | 0.7673 | 0.8538 |
136
+ | 0.5899 | 86.96 | 42000 | 0.8039 | 0.8505 |
137
+ | 0.5812 | 87.99 | 42500 | 0.7467 | 0.8567 |
138
+ | 0.5977 | 89.03 | 43000 | 0.9534 | 0.8316 |
139
+ | 0.6019 | 90.06 | 43500 | 0.9170 | 0.8316 |
140
+ | 0.563 | 91.1 | 44000 | 0.7761 | 0.8569 |
141
+ | 0.6347 | 92.13 | 44500 | 0.7811 | 0.8577 |
142
+ | 0.5855 | 93.17 | 45000 | 0.7562 | 0.8606 |
143
+ | 0.6026 | 94.2 | 45500 | 0.7490 | 0.8636 |
144
+ | 0.5846 | 95.24 | 46000 | 0.7456 | 0.8487 |
145
+ | 0.5635 | 96.27 | 46500 | 0.8115 | 0.8495 |
146
+ | 0.5903 | 97.31 | 47000 | 0.8137 | 0.8448 |
147
+ | 0.576 | 98.34 | 47500 | 0.8441 | 0.8424 |
148
+ | 0.5745 | 99.38 | 48000 | 0.7266 | 0.8609 |
149
+ | 0.5915 | 100.41 | 48500 | 0.9169 | 0.8446 |
150
+ | 0.601 | 101.45 | 49000 | 0.7671 | 0.8576 |
151
+ | 0.5713 | 102.48 | 49500 | 0.7868 | 0.8487 |
152
+ | 0.5541 | 103.52 | 50000 | 0.7907 | 0.8569 |
153
+
154
+
155
+ ### Framework versions
156
+
157
+ - Transformers 4.35.2
158
+ - Pytorch 2.0.0
159
+ - Datasets 2.15.0
160
+ - Tokenizers 0.15.0
logs/events.out.tfevents.1709072051.serv-9216.749410.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ddaec0ad57b7075839c79282f1d002e7aa290d85bb49e5bf3d69f8f197d2ae3a
3
+ size 4329
logs/events.out.tfevents.1709072142.serv-9216.750111.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e6a37360b015f0683eb910f09c1f5a1ece82ca6586f31e930ae5f0ba705669c
3
+ size 53301
logs/events.out.tfevents.1709074878.serv-9216.750111.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e10e179d23b81bdc9f9927bf3b6fceadd28fa9097939aa5c6f762b6ccaf97d3
3
+ size 369
logs/mt_cn_lang_adapter.png ADDED
mlm/adapter_config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config": {
3
+ "adapter_residual_before_ln": false,
4
+ "cross_adapter": false,
5
+ "factorized_phm_W": true,
6
+ "factorized_phm_rule": false,
7
+ "hypercomplex_nonlinearity": "glorot-uniform",
8
+ "init_weights": "bert",
9
+ "inv_adapter": null,
10
+ "inv_adapter_reduction_factor": null,
11
+ "is_parallel": false,
12
+ "learn_phm": true,
13
+ "leave_out": [],
14
+ "ln_after": false,
15
+ "ln_before": false,
16
+ "mh_adapter": false,
17
+ "non_linearity": "relu",
18
+ "original_ln_after": true,
19
+ "original_ln_before": true,
20
+ "output_adapter": true,
21
+ "phm_bias": true,
22
+ "phm_c_init": "normal",
23
+ "phm_dim": 4,
24
+ "phm_init_range": 0.0001,
25
+ "phm_layer": false,
26
+ "phm_rank": 1,
27
+ "reduction_factor": 16,
28
+ "residual_before_ln": true,
29
+ "scaling": 1.0,
30
+ "shared_W_phm": false,
31
+ "shared_phm_rule": true,
32
+ "use_gating": false
33
+ },
34
+ "config_id": "9076f36a74755ac4",
35
+ "hidden_size": 768,
36
+ "model_class": "BertForMaskedLM",
37
+ "model_name": "bert-base-multilingual-cased",
38
+ "model_type": "bert",
39
+ "name": "mlm",
40
+ "version": "0.1.1"
41
+ }
mlm/head_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config": null,
3
+ "hidden_size": 768,
4
+ "label2id": {
5
+ "LABEL_0": 0,
6
+ "LABEL_1": 1
7
+ },
8
+ "model_class": "BertForMaskedLM",
9
+ "model_name": "bert-base-multilingual-cased",
10
+ "model_type": "bert",
11
+ "name": null,
12
+ "num_labels": 2,
13
+ "version": "0.1.1"
14
+ }
mlm/pytorch_adapter.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a77f9e21137b7aacd782b79dcf7af527615eefa4cd69bdf8b41385f3b4da3e81
3
+ size 3594917
mlm/pytorch_model_head.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad4d635f02fc17009f80b4a4372a46dc2e948966597fa74cd2facd69fbb75139
3
+ size 370097519
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8245911fe34ac3e760ee802b087864bb19b4cd1900156cdbbe03d5bcb3c4cbfd
3
+ size 11936581
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b634be18e10c459815918b699c7ee3f3bd71eb974625fdddbaac9be32efcbfc0
3
+ size 14575
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45ac04db287f65dfb1b249a2c231138e4e9cf7b14bf5ff37aea1e4cda5391828
3
+ size 627
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": false,
47
+ "mask_token": "[MASK]",
48
+ "model_max_length": 512,
49
+ "pad_token": "[PAD]",
50
+ "sep_token": "[SEP]",
51
+ "strip_accents": null,
52
+ "tokenize_chinese_chars": true,
53
+ "tokenizer_class": "BertTokenizer",
54
+ "unk_token": "[UNK]"
55
+ }
trainer_state.json ADDED
@@ -0,0 +1,1219 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.6427481174468994,
3
+ "best_model_checkpoint": "./models/adapters_mlm_cn/mt/checkpoint-40000",
4
+ "epoch": 82.81573498964804,
5
+ "eval_steps": 500,
6
+ "global_step": 40000,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 1.04,
13
+ "learning_rate": 4.9500000000000004e-05,
14
+ "loss": 2.6689,
15
+ "step": 500
16
+ },
17
+ {
18
+ "epoch": 1.04,
19
+ "eval_accuracy": 0.6676783004552352,
20
+ "eval_loss": 2.1344573497772217,
21
+ "eval_runtime": 1.6941,
22
+ "eval_samples_per_second": 506.449,
23
+ "eval_steps_per_second": 31.874,
24
+ "step": 500
25
+ },
26
+ {
27
+ "epoch": 2.07,
28
+ "learning_rate": 4.9e-05,
29
+ "loss": 2.1415,
30
+ "step": 1000
31
+ },
32
+ {
33
+ "epoch": 2.07,
34
+ "eval_accuracy": 0.6926470588235294,
35
+ "eval_loss": 1.8435733318328857,
36
+ "eval_runtime": 1.6896,
37
+ "eval_samples_per_second": 507.824,
38
+ "eval_steps_per_second": 31.961,
39
+ "step": 1000
40
+ },
41
+ {
42
+ "epoch": 3.11,
43
+ "learning_rate": 4.85e-05,
44
+ "loss": 1.9421,
45
+ "step": 1500
46
+ },
47
+ {
48
+ "epoch": 3.11,
49
+ "eval_accuracy": 0.690674753601213,
50
+ "eval_loss": 1.787391185760498,
51
+ "eval_runtime": 1.6956,
52
+ "eval_samples_per_second": 506.016,
53
+ "eval_steps_per_second": 31.847,
54
+ "step": 1500
55
+ },
56
+ {
57
+ "epoch": 4.14,
58
+ "learning_rate": 4.8e-05,
59
+ "loss": 1.7588,
60
+ "step": 2000
61
+ },
62
+ {
63
+ "epoch": 4.14,
64
+ "eval_accuracy": 0.7012509197939661,
65
+ "eval_loss": 1.760498285293579,
66
+ "eval_runtime": 1.6894,
67
+ "eval_samples_per_second": 507.882,
68
+ "eval_steps_per_second": 31.965,
69
+ "step": 2000
70
+ },
71
+ {
72
+ "epoch": 5.18,
73
+ "learning_rate": 4.75e-05,
74
+ "loss": 1.6729,
75
+ "step": 2500
76
+ },
77
+ {
78
+ "epoch": 5.18,
79
+ "eval_accuracy": 0.6956845238095238,
80
+ "eval_loss": 1.7567747831344604,
81
+ "eval_runtime": 1.6937,
82
+ "eval_samples_per_second": 506.597,
83
+ "eval_steps_per_second": 31.884,
84
+ "step": 2500
85
+ },
86
+ {
87
+ "epoch": 6.21,
88
+ "learning_rate": 4.7e-05,
89
+ "loss": 1.596,
90
+ "step": 3000
91
+ },
92
+ {
93
+ "epoch": 6.21,
94
+ "eval_accuracy": 0.7272727272727273,
95
+ "eval_loss": 1.5006115436553955,
96
+ "eval_runtime": 1.6969,
97
+ "eval_samples_per_second": 505.627,
98
+ "eval_steps_per_second": 31.823,
99
+ "step": 3000
100
+ },
101
+ {
102
+ "epoch": 7.25,
103
+ "learning_rate": 4.6500000000000005e-05,
104
+ "loss": 1.5778,
105
+ "step": 3500
106
+ },
107
+ {
108
+ "epoch": 7.25,
109
+ "eval_accuracy": 0.7450832072617246,
110
+ "eval_loss": 1.3923866748809814,
111
+ "eval_runtime": 1.6898,
112
+ "eval_samples_per_second": 507.75,
113
+ "eval_steps_per_second": 31.956,
114
+ "step": 3500
115
+ },
116
+ {
117
+ "epoch": 8.28,
118
+ "learning_rate": 4.600000000000001e-05,
119
+ "loss": 1.4821,
120
+ "step": 4000
121
+ },
122
+ {
123
+ "epoch": 8.28,
124
+ "eval_accuracy": 0.7099236641221374,
125
+ "eval_loss": 1.609680414199829,
126
+ "eval_runtime": 1.6898,
127
+ "eval_samples_per_second": 507.758,
128
+ "eval_steps_per_second": 31.957,
129
+ "step": 4000
130
+ },
131
+ {
132
+ "epoch": 9.32,
133
+ "learning_rate": 4.55e-05,
134
+ "loss": 1.4183,
135
+ "step": 4500
136
+ },
137
+ {
138
+ "epoch": 9.32,
139
+ "eval_accuracy": 0.7490551776266062,
140
+ "eval_loss": 1.3551626205444336,
141
+ "eval_runtime": 1.6905,
142
+ "eval_samples_per_second": 507.539,
143
+ "eval_steps_per_second": 31.943,
144
+ "step": 4500
145
+ },
146
+ {
147
+ "epoch": 10.35,
148
+ "learning_rate": 4.5e-05,
149
+ "loss": 1.4197,
150
+ "step": 5000
151
+ },
152
+ {
153
+ "epoch": 10.35,
154
+ "eval_accuracy": 0.7513471901462664,
155
+ "eval_loss": 1.284741997718811,
156
+ "eval_runtime": 1.6878,
157
+ "eval_samples_per_second": 508.353,
158
+ "eval_steps_per_second": 31.994,
159
+ "step": 5000
160
+ },
161
+ {
162
+ "epoch": 11.39,
163
+ "learning_rate": 4.4500000000000004e-05,
164
+ "loss": 1.3156,
165
+ "step": 5500
166
+ },
167
+ {
168
+ "epoch": 11.39,
169
+ "eval_accuracy": 0.7496318114874816,
170
+ "eval_loss": 1.3172950744628906,
171
+ "eval_runtime": 1.6891,
172
+ "eval_samples_per_second": 507.956,
173
+ "eval_steps_per_second": 31.969,
174
+ "step": 5500
175
+ },
176
+ {
177
+ "epoch": 12.42,
178
+ "learning_rate": 4.4000000000000006e-05,
179
+ "loss": 1.2882,
180
+ "step": 6000
181
+ },
182
+ {
183
+ "epoch": 12.42,
184
+ "eval_accuracy": 0.7738095238095238,
185
+ "eval_loss": 1.2816879749298096,
186
+ "eval_runtime": 1.6955,
187
+ "eval_samples_per_second": 506.058,
188
+ "eval_steps_per_second": 31.85,
189
+ "step": 6000
190
+ },
191
+ {
192
+ "epoch": 13.46,
193
+ "learning_rate": 4.35e-05,
194
+ "loss": 1.2692,
195
+ "step": 6500
196
+ },
197
+ {
198
+ "epoch": 13.46,
199
+ "eval_accuracy": 0.775112443778111,
200
+ "eval_loss": 1.189226746559143,
201
+ "eval_runtime": 1.6876,
202
+ "eval_samples_per_second": 508.403,
203
+ "eval_steps_per_second": 31.997,
204
+ "step": 6500
205
+ },
206
+ {
207
+ "epoch": 14.49,
208
+ "learning_rate": 4.3e-05,
209
+ "loss": 1.2368,
210
+ "step": 7000
211
+ },
212
+ {
213
+ "epoch": 14.49,
214
+ "eval_accuracy": 0.7816432272390822,
215
+ "eval_loss": 1.2362936735153198,
216
+ "eval_runtime": 1.6828,
217
+ "eval_samples_per_second": 509.861,
218
+ "eval_steps_per_second": 32.089,
219
+ "step": 7000
220
+ },
221
+ {
222
+ "epoch": 15.53,
223
+ "learning_rate": 4.25e-05,
224
+ "loss": 1.1975,
225
+ "step": 7500
226
+ },
227
+ {
228
+ "epoch": 15.53,
229
+ "eval_accuracy": 0.76996336996337,
230
+ "eval_loss": 1.2442289590835571,
231
+ "eval_runtime": 1.6849,
232
+ "eval_samples_per_second": 509.233,
233
+ "eval_steps_per_second": 32.05,
234
+ "step": 7500
235
+ },
236
+ {
237
+ "epoch": 16.56,
238
+ "learning_rate": 4.2e-05,
239
+ "loss": 1.1907,
240
+ "step": 8000
241
+ },
242
+ {
243
+ "epoch": 16.56,
244
+ "eval_accuracy": 0.7720320466132556,
245
+ "eval_loss": 1.256901502609253,
246
+ "eval_runtime": 1.6884,
247
+ "eval_samples_per_second": 508.188,
248
+ "eval_steps_per_second": 31.984,
249
+ "step": 8000
250
+ },
251
+ {
252
+ "epoch": 17.6,
253
+ "learning_rate": 4.15e-05,
254
+ "loss": 1.1231,
255
+ "step": 8500
256
+ },
257
+ {
258
+ "epoch": 17.6,
259
+ "eval_accuracy": 0.776085300837776,
260
+ "eval_loss": 1.13861083984375,
261
+ "eval_runtime": 1.6842,
262
+ "eval_samples_per_second": 509.436,
263
+ "eval_steps_per_second": 32.062,
264
+ "step": 8500
265
+ },
266
+ {
267
+ "epoch": 18.63,
268
+ "learning_rate": 4.1e-05,
269
+ "loss": 1.0873,
270
+ "step": 9000
271
+ },
272
+ {
273
+ "epoch": 18.63,
274
+ "eval_accuracy": 0.7855547282204021,
275
+ "eval_loss": 1.2104856967926025,
276
+ "eval_runtime": 1.6846,
277
+ "eval_samples_per_second": 509.309,
278
+ "eval_steps_per_second": 32.054,
279
+ "step": 9000
280
+ },
281
+ {
282
+ "epoch": 19.67,
283
+ "learning_rate": 4.05e-05,
284
+ "loss": 1.1242,
285
+ "step": 9500
286
+ },
287
+ {
288
+ "epoch": 19.67,
289
+ "eval_accuracy": 0.7737909516380655,
290
+ "eval_loss": 1.214229702949524,
291
+ "eval_runtime": 1.6848,
292
+ "eval_samples_per_second": 509.258,
293
+ "eval_steps_per_second": 32.051,
294
+ "step": 9500
295
+ },
296
+ {
297
+ "epoch": 20.7,
298
+ "learning_rate": 4e-05,
299
+ "loss": 1.0367,
300
+ "step": 10000
301
+ },
302
+ {
303
+ "epoch": 20.7,
304
+ "eval_accuracy": 0.7712369597615499,
305
+ "eval_loss": 1.2120734453201294,
306
+ "eval_runtime": 1.6938,
307
+ "eval_samples_per_second": 506.56,
308
+ "eval_steps_per_second": 31.881,
309
+ "step": 10000
310
+ },
311
+ {
312
+ "epoch": 21.74,
313
+ "learning_rate": 3.9500000000000005e-05,
314
+ "loss": 1.0869,
315
+ "step": 10500
316
+ },
317
+ {
318
+ "epoch": 21.74,
319
+ "eval_accuracy": 0.7955390334572491,
320
+ "eval_loss": 1.0782362222671509,
321
+ "eval_runtime": 1.6863,
322
+ "eval_samples_per_second": 508.795,
323
+ "eval_steps_per_second": 32.022,
324
+ "step": 10500
325
+ },
326
+ {
327
+ "epoch": 22.77,
328
+ "learning_rate": 3.9000000000000006e-05,
329
+ "loss": 1.0353,
330
+ "step": 11000
331
+ },
332
+ {
333
+ "epoch": 22.77,
334
+ "eval_accuracy": 0.8068535825545171,
335
+ "eval_loss": 0.9917858839035034,
336
+ "eval_runtime": 1.6841,
337
+ "eval_samples_per_second": 509.483,
338
+ "eval_steps_per_second": 32.065,
339
+ "step": 11000
340
+ },
341
+ {
342
+ "epoch": 23.81,
343
+ "learning_rate": 3.85e-05,
344
+ "loss": 1.0324,
345
+ "step": 11500
346
+ },
347
+ {
348
+ "epoch": 23.81,
349
+ "eval_accuracy": 0.7971233913701741,
350
+ "eval_loss": 1.0908266305923462,
351
+ "eval_runtime": 1.6848,
352
+ "eval_samples_per_second": 509.246,
353
+ "eval_steps_per_second": 32.05,
354
+ "step": 11500
355
+ },
356
+ {
357
+ "epoch": 24.84,
358
+ "learning_rate": 3.8e-05,
359
+ "loss": 1.0145,
360
+ "step": 12000
361
+ },
362
+ {
363
+ "epoch": 24.84,
364
+ "eval_accuracy": 0.7975460122699386,
365
+ "eval_loss": 1.0944875478744507,
366
+ "eval_runtime": 1.6827,
367
+ "eval_samples_per_second": 509.9,
368
+ "eval_steps_per_second": 32.092,
369
+ "step": 12000
370
+ },
371
+ {
372
+ "epoch": 25.88,
373
+ "learning_rate": 3.7500000000000003e-05,
374
+ "loss": 0.9951,
375
+ "step": 12500
376
+ },
377
+ {
378
+ "epoch": 25.88,
379
+ "eval_accuracy": 0.8028064992614475,
380
+ "eval_loss": 1.000519037246704,
381
+ "eval_runtime": 1.6933,
382
+ "eval_samples_per_second": 506.714,
383
+ "eval_steps_per_second": 31.891,
384
+ "step": 12500
385
+ },
386
+ {
387
+ "epoch": 26.92,
388
+ "learning_rate": 3.7e-05,
389
+ "loss": 0.9483,
390
+ "step": 13000
391
+ },
392
+ {
393
+ "epoch": 26.92,
394
+ "eval_accuracy": 0.8186646433990895,
395
+ "eval_loss": 0.963790237903595,
396
+ "eval_runtime": 1.6874,
397
+ "eval_samples_per_second": 508.479,
398
+ "eval_steps_per_second": 32.002,
399
+ "step": 13000
400
+ },
401
+ {
402
+ "epoch": 27.95,
403
+ "learning_rate": 3.65e-05,
404
+ "loss": 0.9304,
405
+ "step": 13500
406
+ },
407
+ {
408
+ "epoch": 27.95,
409
+ "eval_accuracy": 0.8204747774480712,
410
+ "eval_loss": 0.9761123657226562,
411
+ "eval_runtime": 1.6869,
412
+ "eval_samples_per_second": 508.622,
413
+ "eval_steps_per_second": 32.011,
414
+ "step": 13500
415
+ },
416
+ {
417
+ "epoch": 28.99,
418
+ "learning_rate": 3.6e-05,
419
+ "loss": 0.8835,
420
+ "step": 14000
421
+ },
422
+ {
423
+ "epoch": 28.99,
424
+ "eval_accuracy": 0.8045801526717558,
425
+ "eval_loss": 1.062032699584961,
426
+ "eval_runtime": 1.6883,
427
+ "eval_samples_per_second": 508.21,
428
+ "eval_steps_per_second": 31.985,
429
+ "step": 14000
430
+ },
431
+ {
432
+ "epoch": 30.02,
433
+ "learning_rate": 3.55e-05,
434
+ "loss": 0.9097,
435
+ "step": 14500
436
+ },
437
+ {
438
+ "epoch": 30.02,
439
+ "eval_accuracy": 0.806015037593985,
440
+ "eval_loss": 0.9137569069862366,
441
+ "eval_runtime": 1.6924,
442
+ "eval_samples_per_second": 506.97,
443
+ "eval_steps_per_second": 31.907,
444
+ "step": 14500
445
+ },
446
+ {
447
+ "epoch": 31.06,
448
+ "learning_rate": 3.5e-05,
449
+ "loss": 0.9293,
450
+ "step": 15000
451
+ },
452
+ {
453
+ "epoch": 31.06,
454
+ "eval_accuracy": 0.8176197836166924,
455
+ "eval_loss": 0.918023943901062,
456
+ "eval_runtime": 1.6905,
457
+ "eval_samples_per_second": 507.53,
458
+ "eval_steps_per_second": 31.942,
459
+ "step": 15000
460
+ },
461
+ {
462
+ "epoch": 32.09,
463
+ "learning_rate": 3.45e-05,
464
+ "loss": 0.9043,
465
+ "step": 15500
466
+ },
467
+ {
468
+ "epoch": 32.09,
469
+ "eval_accuracy": 0.8208269525267994,
470
+ "eval_loss": 0.9214709401130676,
471
+ "eval_runtime": 1.691,
472
+ "eval_samples_per_second": 507.403,
473
+ "eval_steps_per_second": 31.934,
474
+ "step": 15500
475
+ },
476
+ {
477
+ "epoch": 33.13,
478
+ "learning_rate": 3.4000000000000007e-05,
479
+ "loss": 0.8581,
480
+ "step": 16000
481
+ },
482
+ {
483
+ "epoch": 33.13,
484
+ "eval_accuracy": 0.822452229299363,
485
+ "eval_loss": 0.9624596834182739,
486
+ "eval_runtime": 1.6897,
487
+ "eval_samples_per_second": 507.793,
488
+ "eval_steps_per_second": 31.959,
489
+ "step": 16000
490
+ },
491
+ {
492
+ "epoch": 34.16,
493
+ "learning_rate": 3.35e-05,
494
+ "loss": 0.8638,
495
+ "step": 16500
496
+ },
497
+ {
498
+ "epoch": 34.16,
499
+ "eval_accuracy": 0.8367816091954023,
500
+ "eval_loss": 0.8585591316223145,
501
+ "eval_runtime": 1.6912,
502
+ "eval_samples_per_second": 507.329,
503
+ "eval_steps_per_second": 31.93,
504
+ "step": 16500
505
+ },
506
+ {
507
+ "epoch": 35.2,
508
+ "learning_rate": 3.3e-05,
509
+ "loss": 0.874,
510
+ "step": 17000
511
+ },
512
+ {
513
+ "epoch": 35.2,
514
+ "eval_accuracy": 0.8135072908672295,
515
+ "eval_loss": 1.0043973922729492,
516
+ "eval_runtime": 1.6896,
517
+ "eval_samples_per_second": 507.801,
518
+ "eval_steps_per_second": 31.96,
519
+ "step": 17000
520
+ },
521
+ {
522
+ "epoch": 36.23,
523
+ "learning_rate": 3.2500000000000004e-05,
524
+ "loss": 0.8235,
525
+ "step": 17500
526
+ },
527
+ {
528
+ "epoch": 36.23,
529
+ "eval_accuracy": 0.8183890577507599,
530
+ "eval_loss": 0.9755066633224487,
531
+ "eval_runtime": 1.6947,
532
+ "eval_samples_per_second": 506.289,
533
+ "eval_steps_per_second": 31.864,
534
+ "step": 17500
535
+ },
536
+ {
537
+ "epoch": 37.27,
538
+ "learning_rate": 3.2000000000000005e-05,
539
+ "loss": 0.8589,
540
+ "step": 18000
541
+ },
542
+ {
543
+ "epoch": 37.27,
544
+ "eval_accuracy": 0.8291761148904006,
545
+ "eval_loss": 0.9042153358459473,
546
+ "eval_runtime": 1.6905,
547
+ "eval_samples_per_second": 507.55,
548
+ "eval_steps_per_second": 31.944,
549
+ "step": 18000
550
+ },
551
+ {
552
+ "epoch": 38.3,
553
+ "learning_rate": 3.15e-05,
554
+ "loss": 0.8107,
555
+ "step": 18500
556
+ },
557
+ {
558
+ "epoch": 38.3,
559
+ "eval_accuracy": 0.8272327964860908,
560
+ "eval_loss": 0.8821109533309937,
561
+ "eval_runtime": 1.6895,
562
+ "eval_samples_per_second": 507.845,
563
+ "eval_steps_per_second": 31.962,
564
+ "step": 18500
565
+ },
566
+ {
567
+ "epoch": 39.34,
568
+ "learning_rate": 3.1e-05,
569
+ "loss": 0.8346,
570
+ "step": 19000
571
+ },
572
+ {
573
+ "epoch": 39.34,
574
+ "eval_accuracy": 0.8248286367098249,
575
+ "eval_loss": 0.9061236381530762,
576
+ "eval_runtime": 1.6919,
577
+ "eval_samples_per_second": 507.136,
578
+ "eval_steps_per_second": 31.918,
579
+ "step": 19000
580
+ },
581
+ {
582
+ "epoch": 40.37,
583
+ "learning_rate": 3.05e-05,
584
+ "loss": 0.8393,
585
+ "step": 19500
586
+ },
587
+ {
588
+ "epoch": 40.37,
589
+ "eval_accuracy": 0.8234854151084517,
590
+ "eval_loss": 0.9795840978622437,
591
+ "eval_runtime": 1.6939,
592
+ "eval_samples_per_second": 506.513,
593
+ "eval_steps_per_second": 31.878,
594
+ "step": 19500
595
+ },
596
+ {
597
+ "epoch": 41.41,
598
+ "learning_rate": 3e-05,
599
+ "loss": 0.789,
600
+ "step": 20000
601
+ },
602
+ {
603
+ "epoch": 41.41,
604
+ "eval_accuracy": 0.833076923076923,
605
+ "eval_loss": 0.9014851450920105,
606
+ "eval_runtime": 1.689,
607
+ "eval_samples_per_second": 508.0,
608
+ "eval_steps_per_second": 31.972,
609
+ "step": 20000
610
+ },
611
+ {
612
+ "epoch": 42.44,
613
+ "learning_rate": 2.95e-05,
614
+ "loss": 0.8121,
615
+ "step": 20500
616
+ },
617
+ {
618
+ "epoch": 42.44,
619
+ "eval_accuracy": 0.8385913426265591,
620
+ "eval_loss": 0.8589309453964233,
621
+ "eval_runtime": 1.6873,
622
+ "eval_samples_per_second": 508.516,
623
+ "eval_steps_per_second": 32.005,
624
+ "step": 20500
625
+ },
626
+ {
627
+ "epoch": 43.48,
628
+ "learning_rate": 2.9e-05,
629
+ "loss": 0.7709,
630
+ "step": 21000
631
+ },
632
+ {
633
+ "epoch": 43.48,
634
+ "eval_accuracy": 0.8350903614457831,
635
+ "eval_loss": 0.8835715055465698,
636
+ "eval_runtime": 1.6829,
637
+ "eval_samples_per_second": 509.835,
638
+ "eval_steps_per_second": 32.088,
639
+ "step": 21000
640
+ },
641
+ {
642
+ "epoch": 44.51,
643
+ "learning_rate": 2.8499999999999998e-05,
644
+ "loss": 0.7922,
645
+ "step": 21500
646
+ },
647
+ {
648
+ "epoch": 44.51,
649
+ "eval_accuracy": 0.817974105102818,
650
+ "eval_loss": 0.9523779153823853,
651
+ "eval_runtime": 1.6863,
652
+ "eval_samples_per_second": 508.799,
653
+ "eval_steps_per_second": 32.022,
654
+ "step": 21500
655
+ },
656
+ {
657
+ "epoch": 45.55,
658
+ "learning_rate": 2.8000000000000003e-05,
659
+ "loss": 0.7457,
660
+ "step": 22000
661
+ },
662
+ {
663
+ "epoch": 45.55,
664
+ "eval_accuracy": 0.8364451082897685,
665
+ "eval_loss": 0.8350428938865662,
666
+ "eval_runtime": 1.6901,
667
+ "eval_samples_per_second": 507.673,
668
+ "eval_steps_per_second": 31.951,
669
+ "step": 22000
670
+ },
671
+ {
672
+ "epoch": 46.58,
673
+ "learning_rate": 2.7500000000000004e-05,
674
+ "loss": 0.7386,
675
+ "step": 22500
676
+ },
677
+ {
678
+ "epoch": 46.58,
679
+ "eval_accuracy": 0.8340807174887892,
680
+ "eval_loss": 0.9024766087532043,
681
+ "eval_runtime": 1.6912,
682
+ "eval_samples_per_second": 507.34,
683
+ "eval_steps_per_second": 31.93,
684
+ "step": 22500
685
+ },
686
+ {
687
+ "epoch": 47.62,
688
+ "learning_rate": 2.7000000000000002e-05,
689
+ "loss": 0.7515,
690
+ "step": 23000
691
+ },
692
+ {
693
+ "epoch": 47.62,
694
+ "eval_accuracy": 0.8390166534496432,
695
+ "eval_loss": 0.9091906547546387,
696
+ "eval_runtime": 1.686,
697
+ "eval_samples_per_second": 508.899,
698
+ "eval_steps_per_second": 32.029,
699
+ "step": 23000
700
+ },
701
+ {
702
+ "epoch": 48.65,
703
+ "learning_rate": 2.6500000000000004e-05,
704
+ "loss": 0.7324,
705
+ "step": 23500
706
+ },
707
+ {
708
+ "epoch": 48.65,
709
+ "eval_accuracy": 0.8420647149460708,
710
+ "eval_loss": 0.8322407007217407,
711
+ "eval_runtime": 1.6918,
712
+ "eval_samples_per_second": 507.153,
713
+ "eval_steps_per_second": 31.919,
714
+ "step": 23500
715
+ },
716
+ {
717
+ "epoch": 49.69,
718
+ "learning_rate": 2.6000000000000002e-05,
719
+ "loss": 0.7314,
720
+ "step": 24000
721
+ },
722
+ {
723
+ "epoch": 49.69,
724
+ "eval_accuracy": 0.8477078477078477,
725
+ "eval_loss": 0.7967829704284668,
726
+ "eval_runtime": 1.6933,
727
+ "eval_samples_per_second": 506.713,
728
+ "eval_steps_per_second": 31.891,
729
+ "step": 24000
730
+ },
731
+ {
732
+ "epoch": 50.72,
733
+ "learning_rate": 2.5500000000000003e-05,
734
+ "loss": 0.7442,
735
+ "step": 24500
736
+ },
737
+ {
738
+ "epoch": 50.72,
739
+ "eval_accuracy": 0.8324407039020658,
740
+ "eval_loss": 0.930473268032074,
741
+ "eval_runtime": 1.6828,
742
+ "eval_samples_per_second": 509.873,
743
+ "eval_steps_per_second": 32.09,
744
+ "step": 24500
745
+ },
746
+ {
747
+ "epoch": 51.76,
748
+ "learning_rate": 2.5e-05,
749
+ "loss": 0.7074,
750
+ "step": 25000
751
+ },
752
+ {
753
+ "epoch": 51.76,
754
+ "eval_accuracy": 0.820839580209895,
755
+ "eval_loss": 1.001060962677002,
756
+ "eval_runtime": 1.6867,
757
+ "eval_samples_per_second": 508.672,
758
+ "eval_steps_per_second": 32.014,
759
+ "step": 25000
760
+ },
761
+ {
762
+ "epoch": 52.8,
763
+ "learning_rate": 2.45e-05,
764
+ "loss": 0.739,
765
+ "step": 25500
766
+ },
767
+ {
768
+ "epoch": 52.8,
769
+ "eval_accuracy": 0.8330945558739254,
770
+ "eval_loss": 0.8732258677482605,
771
+ "eval_runtime": 1.6896,
772
+ "eval_samples_per_second": 507.823,
773
+ "eval_steps_per_second": 31.961,
774
+ "step": 25500
775
+ },
776
+ {
777
+ "epoch": 53.83,
778
+ "learning_rate": 2.4e-05,
779
+ "loss": 0.7243,
780
+ "step": 26000
781
+ },
782
+ {
783
+ "epoch": 53.83,
784
+ "eval_accuracy": 0.8479880774962743,
785
+ "eval_loss": 0.7857112288475037,
786
+ "eval_runtime": 1.687,
787
+ "eval_samples_per_second": 508.591,
788
+ "eval_steps_per_second": 32.009,
789
+ "step": 26000
790
+ },
791
+ {
792
+ "epoch": 54.87,
793
+ "learning_rate": 2.35e-05,
794
+ "loss": 0.6842,
795
+ "step": 26500
796
+ },
797
+ {
798
+ "epoch": 54.87,
799
+ "eval_accuracy": 0.8377192982456141,
800
+ "eval_loss": 0.7945135235786438,
801
+ "eval_runtime": 1.6902,
802
+ "eval_samples_per_second": 507.642,
803
+ "eval_steps_per_second": 31.949,
804
+ "step": 26500
805
+ },
806
+ {
807
+ "epoch": 55.9,
808
+ "learning_rate": 2.3000000000000003e-05,
809
+ "loss": 0.6991,
810
+ "step": 27000
811
+ },
812
+ {
813
+ "epoch": 55.9,
814
+ "eval_accuracy": 0.8275351591413768,
815
+ "eval_loss": 0.9627696871757507,
816
+ "eval_runtime": 1.6871,
817
+ "eval_samples_per_second": 508.578,
818
+ "eval_steps_per_second": 32.008,
819
+ "step": 27000
820
+ },
821
+ {
822
+ "epoch": 56.94,
823
+ "learning_rate": 2.25e-05,
824
+ "loss": 0.6896,
825
+ "step": 27500
826
+ },
827
+ {
828
+ "epoch": 56.94,
829
+ "eval_accuracy": 0.840960240060015,
830
+ "eval_loss": 0.8363039493560791,
831
+ "eval_runtime": 1.684,
832
+ "eval_samples_per_second": 509.495,
833
+ "eval_steps_per_second": 32.066,
834
+ "step": 27500
835
+ },
836
+ {
837
+ "epoch": 57.97,
838
+ "learning_rate": 2.2000000000000003e-05,
839
+ "loss": 0.6925,
840
+ "step": 28000
841
+ },
842
+ {
843
+ "epoch": 57.97,
844
+ "eval_accuracy": 0.8391812865497076,
845
+ "eval_loss": 0.8432921767234802,
846
+ "eval_runtime": 1.6968,
847
+ "eval_samples_per_second": 505.655,
848
+ "eval_steps_per_second": 31.824,
849
+ "step": 28000
850
+ },
851
+ {
852
+ "epoch": 59.01,
853
+ "learning_rate": 2.15e-05,
854
+ "loss": 0.7081,
855
+ "step": 28500
856
+ },
857
+ {
858
+ "epoch": 59.01,
859
+ "eval_accuracy": 0.8223048327137547,
860
+ "eval_loss": 1.0085676908493042,
861
+ "eval_runtime": 1.69,
862
+ "eval_samples_per_second": 507.688,
863
+ "eval_steps_per_second": 31.952,
864
+ "step": 28500
865
+ },
866
+ {
867
+ "epoch": 60.04,
868
+ "learning_rate": 2.1e-05,
869
+ "loss": 0.6598,
870
+ "step": 29000
871
+ },
872
+ {
873
+ "epoch": 60.04,
874
+ "eval_accuracy": 0.8333333333333334,
875
+ "eval_loss": 0.9250668883323669,
876
+ "eval_runtime": 1.686,
877
+ "eval_samples_per_second": 508.895,
878
+ "eval_steps_per_second": 32.028,
879
+ "step": 29000
880
+ },
881
+ {
882
+ "epoch": 61.08,
883
+ "learning_rate": 2.05e-05,
884
+ "loss": 0.6677,
885
+ "step": 29500
886
+ },
887
+ {
888
+ "epoch": 61.08,
889
+ "eval_accuracy": 0.8437047756874095,
890
+ "eval_loss": 0.8822752237319946,
891
+ "eval_runtime": 1.693,
892
+ "eval_samples_per_second": 506.807,
893
+ "eval_steps_per_second": 31.897,
894
+ "step": 29500
895
+ },
896
+ {
897
+ "epoch": 62.11,
898
+ "learning_rate": 2e-05,
899
+ "loss": 0.695,
900
+ "step": 30000
901
+ },
902
+ {
903
+ "epoch": 62.11,
904
+ "eval_accuracy": 0.8560371517027864,
905
+ "eval_loss": 0.7750544548034668,
906
+ "eval_runtime": 1.6969,
907
+ "eval_samples_per_second": 505.632,
908
+ "eval_steps_per_second": 31.823,
909
+ "step": 30000
910
+ },
911
+ {
912
+ "epoch": 63.15,
913
+ "learning_rate": 1.9500000000000003e-05,
914
+ "loss": 0.7108,
915
+ "step": 30500
916
+ },
917
+ {
918
+ "epoch": 63.15,
919
+ "eval_accuracy": 0.8481104651162791,
920
+ "eval_loss": 0.8452057242393494,
921
+ "eval_runtime": 1.6974,
922
+ "eval_samples_per_second": 505.49,
923
+ "eval_steps_per_second": 31.814,
924
+ "step": 30500
925
+ },
926
+ {
927
+ "epoch": 64.18,
928
+ "learning_rate": 1.9e-05,
929
+ "loss": 0.6721,
930
+ "step": 31000
931
+ },
932
+ {
933
+ "epoch": 64.18,
934
+ "eval_accuracy": 0.8413284132841329,
935
+ "eval_loss": 0.8559600114822388,
936
+ "eval_runtime": 1.6936,
937
+ "eval_samples_per_second": 506.623,
938
+ "eval_steps_per_second": 31.885,
939
+ "step": 31000
940
+ },
941
+ {
942
+ "epoch": 65.22,
943
+ "learning_rate": 1.85e-05,
944
+ "loss": 0.6571,
945
+ "step": 31500
946
+ },
947
+ {
948
+ "epoch": 65.22,
949
+ "eval_accuracy": 0.8163109756097561,
950
+ "eval_loss": 0.98003089427948,
951
+ "eval_runtime": 1.6913,
952
+ "eval_samples_per_second": 507.303,
953
+ "eval_steps_per_second": 31.928,
954
+ "step": 31500
955
+ },
956
+ {
957
+ "epoch": 66.25,
958
+ "learning_rate": 1.8e-05,
959
+ "loss": 0.6891,
960
+ "step": 32000
961
+ },
962
+ {
963
+ "epoch": 66.25,
964
+ "eval_accuracy": 0.8457446808510638,
965
+ "eval_loss": 0.8105884194374084,
966
+ "eval_runtime": 1.6942,
967
+ "eval_samples_per_second": 506.435,
968
+ "eval_steps_per_second": 31.874,
969
+ "step": 32000
970
+ },
971
+ {
972
+ "epoch": 67.29,
973
+ "learning_rate": 1.75e-05,
974
+ "loss": 0.6541,
975
+ "step": 32500
976
+ },
977
+ {
978
+ "epoch": 67.29,
979
+ "eval_accuracy": 0.8429752066115702,
980
+ "eval_loss": 0.8197007179260254,
981
+ "eval_runtime": 1.6912,
982
+ "eval_samples_per_second": 507.332,
983
+ "eval_steps_per_second": 31.93,
984
+ "step": 32500
985
+ },
986
+ {
987
+ "epoch": 68.32,
988
+ "learning_rate": 1.7000000000000003e-05,
989
+ "loss": 0.6559,
990
+ "step": 33000
991
+ },
992
+ {
993
+ "epoch": 68.32,
994
+ "eval_accuracy": 0.8388305847076462,
995
+ "eval_loss": 0.8678442239761353,
996
+ "eval_runtime": 1.6945,
997
+ "eval_samples_per_second": 506.35,
998
+ "eval_steps_per_second": 31.868,
999
+ "step": 33000
1000
+ },
1001
+ {
1002
+ "epoch": 69.36,
1003
+ "learning_rate": 1.65e-05,
1004
+ "loss": 0.6554,
1005
+ "step": 33500
1006
+ },
1007
+ {
1008
+ "epoch": 69.36,
1009
+ "eval_accuracy": 0.8661764705882353,
1010
+ "eval_loss": 0.7396097183227539,
1011
+ "eval_runtime": 1.6934,
1012
+ "eval_samples_per_second": 506.658,
1013
+ "eval_steps_per_second": 31.888,
1014
+ "step": 33500
1015
+ },
1016
+ {
1017
+ "epoch": 70.39,
1018
+ "learning_rate": 1.6000000000000003e-05,
1019
+ "loss": 0.618,
1020
+ "step": 34000
1021
+ },
1022
+ {
1023
+ "epoch": 70.39,
1024
+ "eval_accuracy": 0.8375634517766497,
1025
+ "eval_loss": 0.8517589569091797,
1026
+ "eval_runtime": 1.6983,
1027
+ "eval_samples_per_second": 505.224,
1028
+ "eval_steps_per_second": 31.797,
1029
+ "step": 34000
1030
+ },
1031
+ {
1032
+ "epoch": 71.43,
1033
+ "learning_rate": 1.55e-05,
1034
+ "loss": 0.6558,
1035
+ "step": 34500
1036
+ },
1037
+ {
1038
+ "epoch": 71.43,
1039
+ "eval_accuracy": 0.8409090909090909,
1040
+ "eval_loss": 0.7705618739128113,
1041
+ "eval_runtime": 1.6954,
1042
+ "eval_samples_per_second": 506.065,
1043
+ "eval_steps_per_second": 31.85,
1044
+ "step": 34500
1045
+ },
1046
+ {
1047
+ "epoch": 72.46,
1048
+ "learning_rate": 1.5e-05,
1049
+ "loss": 0.6034,
1050
+ "step": 35000
1051
+ },
1052
+ {
1053
+ "epoch": 72.46,
1054
+ "eval_accuracy": 0.8517699115044248,
1055
+ "eval_loss": 0.7829406261444092,
1056
+ "eval_runtime": 1.6974,
1057
+ "eval_samples_per_second": 505.471,
1058
+ "eval_steps_per_second": 31.813,
1059
+ "step": 35000
1060
+ },
1061
+ {
1062
+ "epoch": 73.5,
1063
+ "learning_rate": 1.45e-05,
1064
+ "loss": 0.6336,
1065
+ "step": 35500
1066
+ },
1067
+ {
1068
+ "epoch": 73.5,
1069
+ "eval_accuracy": 0.8591445427728613,
1070
+ "eval_loss": 0.7834987640380859,
1071
+ "eval_runtime": 1.6914,
1072
+ "eval_samples_per_second": 507.26,
1073
+ "eval_steps_per_second": 31.925,
1074
+ "step": 35500
1075
+ },
1076
+ {
1077
+ "epoch": 74.53,
1078
+ "learning_rate": 1.4000000000000001e-05,
1079
+ "loss": 0.6287,
1080
+ "step": 36000
1081
+ },
1082
+ {
1083
+ "epoch": 74.53,
1084
+ "eval_accuracy": 0.8574748257164988,
1085
+ "eval_loss": 0.7547706961631775,
1086
+ "eval_runtime": 1.6906,
1087
+ "eval_samples_per_second": 507.513,
1088
+ "eval_steps_per_second": 31.941,
1089
+ "step": 36000
1090
+ },
1091
+ {
1092
+ "epoch": 75.57,
1093
+ "learning_rate": 1.3500000000000001e-05,
1094
+ "loss": 0.6065,
1095
+ "step": 36500
1096
+ },
1097
+ {
1098
+ "epoch": 75.57,
1099
+ "eval_accuracy": 0.8508005822416302,
1100
+ "eval_loss": 0.8541703224182129,
1101
+ "eval_runtime": 1.6919,
1102
+ "eval_samples_per_second": 507.134,
1103
+ "eval_steps_per_second": 31.918,
1104
+ "step": 36500
1105
+ },
1106
+ {
1107
+ "epoch": 76.6,
1108
+ "learning_rate": 1.3000000000000001e-05,
1109
+ "loss": 0.6029,
1110
+ "step": 37000
1111
+ },
1112
+ {
1113
+ "epoch": 76.6,
1114
+ "eval_accuracy": 0.8405267008046818,
1115
+ "eval_loss": 0.8202521800994873,
1116
+ "eval_runtime": 1.6903,
1117
+ "eval_samples_per_second": 507.595,
1118
+ "eval_steps_per_second": 31.947,
1119
+ "step": 37000
1120
+ },
1121
+ {
1122
+ "epoch": 77.64,
1123
+ "learning_rate": 1.25e-05,
1124
+ "loss": 0.6208,
1125
+ "step": 37500
1126
+ },
1127
+ {
1128
+ "epoch": 77.64,
1129
+ "eval_accuracy": 0.8661417322834646,
1130
+ "eval_loss": 0.7082335948944092,
1131
+ "eval_runtime": 1.6867,
1132
+ "eval_samples_per_second": 508.681,
1133
+ "eval_steps_per_second": 32.015,
1134
+ "step": 37500
1135
+ },
1136
+ {
1137
+ "epoch": 78.67,
1138
+ "learning_rate": 1.2e-05,
1139
+ "loss": 0.64,
1140
+ "step": 38000
1141
+ },
1142
+ {
1143
+ "epoch": 78.67,
1144
+ "eval_accuracy": 0.8410295230885693,
1145
+ "eval_loss": 0.8504825234413147,
1146
+ "eval_runtime": 1.6943,
1147
+ "eval_samples_per_second": 506.417,
1148
+ "eval_steps_per_second": 31.872,
1149
+ "step": 38000
1150
+ },
1151
+ {
1152
+ "epoch": 79.71,
1153
+ "learning_rate": 1.1500000000000002e-05,
1154
+ "loss": 0.6144,
1155
+ "step": 38500
1156
+ },
1157
+ {
1158
+ "epoch": 79.71,
1159
+ "eval_accuracy": 0.8603636363636363,
1160
+ "eval_loss": 0.7246142625808716,
1161
+ "eval_runtime": 1.6864,
1162
+ "eval_samples_per_second": 508.77,
1163
+ "eval_steps_per_second": 32.02,
1164
+ "step": 38500
1165
+ },
1166
+ {
1167
+ "epoch": 80.75,
1168
+ "learning_rate": 1.1000000000000001e-05,
1169
+ "loss": 0.6507,
1170
+ "step": 39000
1171
+ },
1172
+ {
1173
+ "epoch": 80.75,
1174
+ "eval_accuracy": 0.861132660977502,
1175
+ "eval_loss": 0.7150202393531799,
1176
+ "eval_runtime": 1.701,
1177
+ "eval_samples_per_second": 504.398,
1178
+ "eval_steps_per_second": 31.745,
1179
+ "step": 39000
1180
+ },
1181
+ {
1182
+ "epoch": 81.78,
1183
+ "learning_rate": 1.05e-05,
1184
+ "loss": 0.6177,
1185
+ "step": 39500
1186
+ },
1187
+ {
1188
+ "epoch": 81.78,
1189
+ "eval_accuracy": 0.84,
1190
+ "eval_loss": 0.9331970810890198,
1191
+ "eval_runtime": 1.6939,
1192
+ "eval_samples_per_second": 506.536,
1193
+ "eval_steps_per_second": 31.88,
1194
+ "step": 39500
1195
+ },
1196
+ {
1197
+ "epoch": 82.82,
1198
+ "learning_rate": 1e-05,
1199
+ "loss": 0.6159,
1200
+ "step": 40000
1201
+ },
1202
+ {
1203
+ "epoch": 82.82,
1204
+ "eval_accuracy": 0.8733488733488733,
1205
+ "eval_loss": 0.6427481174468994,
1206
+ "eval_runtime": 1.6965,
1207
+ "eval_samples_per_second": 505.755,
1208
+ "eval_steps_per_second": 31.831,
1209
+ "step": 40000
1210
+ }
1211
+ ],
1212
+ "logging_steps": 500,
1213
+ "max_steps": 50000,
1214
+ "num_train_epochs": 104,
1215
+ "save_steps": 500,
1216
+ "total_flos": 6042662847119360.0,
1217
+ "trial_name": null,
1218
+ "trial_params": null
1219
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bacabf0df5e89f749d36fdc2817cc26562dfe21a76bd4dbb1d9f549c9123cbaa
3
+ size 4091
vocab.txt ADDED
The diff for this file is too large to render. See raw diff