lombardata commited on
Commit
59679ee
1 Parent(s): c3e8512

Evaluation on the test set completed on 2024_09_08.

Browse files
README.md ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: facebook/dinov2-giant
4
+ tags:
5
+ - generated_from_trainer
6
+ metrics:
7
+ - accuracy
8
+ model-index:
9
+ - name: DinoVdeau-giant-2024_08_28-batch-size32_epochs150_freeze
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # DinoVdeau-giant-2024_08_28-batch-size32_epochs150_freeze
17
+
18
+ This model is a fine-tuned version of [facebook/dinov2-giant](https://huggingface.co/facebook/dinov2-giant) on the None dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.1208
21
+ - F1 Micro: 0.8209
22
+ - F1 Macro: 0.7101
23
+ - Roc Auc: 0.8812
24
+ - Accuracy: 0.3080
25
+ - Learning Rate: 0.0000
26
+
27
+ ## Model description
28
+
29
+ More information needed
30
+
31
+ ## Intended uses & limitations
32
+
33
+ More information needed
34
+
35
+ ## Training and evaluation data
36
+
37
+ More information needed
38
+
39
+ ## Training procedure
40
+
41
+ ### Training hyperparameters
42
+
43
+ The following hyperparameters were used during training:
44
+ - learning_rate: 0.001
45
+ - train_batch_size: 32
46
+ - eval_batch_size: 32
47
+ - seed: 42
48
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
+ - lr_scheduler_type: linear
50
+ - num_epochs: 150
51
+ - mixed_precision_training: Native AMP
52
+
53
+ ### Training results
54
+
55
+ | Training Loss | Epoch | Step | Accuracy | F1 Macro | F1 Micro | Validation Loss | Roc Auc | Rate |
56
+ |:-------------:|:-----:|:-----:|:--------:|:--------:|:--------:|:---------------:|:-------:|:------:|
57
+ | No log | 1.0 | 273 | 0.2121 | 0.5175 | 0.7424 | 0.1744 | 0.8286 | 0.001 |
58
+ | 0.2593 | 2.0 | 546 | 0.2477 | 0.5913 | 0.7777 | 0.1514 | 0.8565 | 0.001 |
59
+ | 0.2593 | 3.0 | 819 | 0.2387 | 0.6203 | 0.7753 | 0.1557 | 0.8580 | 0.001 |
60
+ | 0.1694 | 4.0 | 1092 | 0.2495 | 0.6113 | 0.7691 | 0.1499 | 0.8373 | 0.001 |
61
+ | 0.1694 | 5.0 | 1365 | 0.2450 | 0.6317 | 0.7745 | 0.1577 | 0.8461 | 0.001 |
62
+ | 0.1637 | 6.0 | 1638 | 0.2574 | 0.6221 | 0.7803 | 0.1530 | 0.8509 | 0.001 |
63
+ | 0.1637 | 7.0 | 1911 | 0.2616 | 0.6318 | 0.7838 | 0.1423 | 0.8520 | 0.001 |
64
+ | 0.1598 | 8.0 | 2184 | 0.2592 | 0.6268 | 0.7825 | 0.1434 | 0.8521 | 0.001 |
65
+ | 0.1598 | 9.0 | 2457 | 0.2585 | 0.6407 | 0.7841 | 0.1432 | 0.8556 | 0.001 |
66
+ | 0.157 | 10.0 | 2730 | 0.2592 | 0.6350 | 0.7779 | 0.1507 | 0.8422 | 0.001 |
67
+ | 0.1564 | 11.0 | 3003 | 0.2685 | 0.6442 | 0.7906 | 0.1401 | 0.8599 | 0.001 |
68
+ | 0.1564 | 12.0 | 3276 | 0.2606 | 0.6413 | 0.7896 | 0.1404 | 0.8593 | 0.001 |
69
+ | 0.1556 | 13.0 | 3549 | 0.2696 | 0.6359 | 0.7822 | 0.1421 | 0.8492 | 0.001 |
70
+ | 0.1556 | 14.0 | 3822 | 0.2637 | 0.6460 | 0.7887 | 0.1394 | 0.8568 | 0.001 |
71
+ | 0.1547 | 15.0 | 4095 | 0.2554 | 0.6554 | 0.7915 | 0.1380 | 0.8576 | 0.001 |
72
+ | 0.1547 | 16.0 | 4368 | 0.2550 | 0.6453 | 0.7858 | 0.1441 | 0.8506 | 0.001 |
73
+ | 0.1539 | 17.0 | 4641 | 0.2678 | 0.6485 | 0.7904 | 0.1411 | 0.8607 | 0.001 |
74
+ | 0.1539 | 18.0 | 4914 | 0.2606 | 0.6549 | 0.7941 | 0.1381 | 0.8618 | 0.001 |
75
+ | 0.1552 | 19.0 | 5187 | 0.2654 | 0.6523 | 0.7937 | 0.1372 | 0.8604 | 0.001 |
76
+ | 0.1552 | 20.0 | 5460 | 0.2540 | 0.6515 | 0.7915 | 0.1396 | 0.8594 | 0.001 |
77
+ | 0.1531 | 21.0 | 5733 | 0.2578 | 0.6543 | 0.7925 | 0.1379 | 0.8593 | 0.001 |
78
+ | 0.1536 | 22.0 | 6006 | 0.2661 | 0.6524 | 0.7952 | 0.1363 | 0.8620 | 0.001 |
79
+ | 0.1536 | 23.0 | 6279 | 0.2710 | 0.6567 | 0.7962 | 0.1363 | 0.8595 | 0.001 |
80
+ | 0.1535 | 24.0 | 6552 | 0.2661 | 0.6439 | 0.7872 | 0.1401 | 0.8565 | 0.001 |
81
+ | 0.1535 | 25.0 | 6825 | 0.2755 | 0.6538 | 0.7961 | 0.1360 | 0.8589 | 0.001 |
82
+ | 0.153 | 26.0 | 7098 | 0.2692 | 0.6408 | 0.7942 | 0.1371 | 0.8612 | 0.001 |
83
+ | 0.153 | 27.0 | 7371 | 0.2654 | 0.6470 | 0.7902 | 0.1367 | 0.8539 | 0.001 |
84
+ | 0.1532 | 28.0 | 7644 | 0.2689 | 0.6427 | 0.7912 | 0.1371 | 0.8539 | 0.001 |
85
+ | 0.1532 | 29.0 | 7917 | 0.2692 | 0.6485 | 0.7944 | 0.1378 | 0.8597 | 0.001 |
86
+ | 0.1539 | 30.0 | 8190 | 0.2651 | 0.6472 | 0.7938 | 0.1364 | 0.8590 | 0.001 |
87
+ | 0.1539 | 31.0 | 8463 | 0.2748 | 0.6533 | 0.7999 | 0.1357 | 0.8673 | 0.001 |
88
+ | 0.1527 | 32.0 | 8736 | 0.2665 | 0.6620 | 0.7929 | 0.1379 | 0.8630 | 0.001 |
89
+ | 0.1524 | 33.0 | 9009 | 0.2730 | 0.6722 | 0.7990 | 0.1356 | 0.8643 | 0.001 |
90
+ | 0.1524 | 34.0 | 9282 | 0.2730 | 0.6706 | 0.7967 | 0.1347 | 0.8615 | 0.001 |
91
+ | 0.1516 | 35.0 | 9555 | 0.2772 | 0.6483 | 0.7947 | 0.1354 | 0.8588 | 0.001 |
92
+ | 0.1516 | 36.0 | 9828 | 0.2585 | 0.6553 | 0.7928 | 0.1376 | 0.8582 | 0.001 |
93
+ | 0.1527 | 37.0 | 10101 | 0.2748 | 0.6681 | 0.7992 | 0.1346 | 0.8638 | 0.001 |
94
+ | 0.1527 | 38.0 | 10374 | 0.2717 | 0.6543 | 0.7889 | 0.1378 | 0.8525 | 0.001 |
95
+ | 0.1503 | 39.0 | 10647 | 0.2665 | 0.6627 | 0.7965 | 0.1367 | 0.8659 | 0.001 |
96
+ | 0.1503 | 40.0 | 10920 | 0.2737 | 0.6702 | 0.8005 | 0.1373 | 0.8705 | 0.001 |
97
+ | 0.152 | 41.0 | 11193 | 0.2658 | 0.6610 | 0.7942 | 0.1377 | 0.8583 | 0.001 |
98
+ | 0.152 | 42.0 | 11466 | 0.2810 | 0.6706 | 0.8002 | 0.1354 | 0.8642 | 0.001 |
99
+ | 0.1515 | 43.0 | 11739 | 0.2651 | 0.6620 | 0.8000 | 0.1367 | 0.8699 | 0.001 |
100
+ | 0.147 | 44.0 | 12012 | 0.2869 | 0.6826 | 0.8087 | 0.1291 | 0.8724 | 0.0001 |
101
+ | 0.147 | 45.0 | 12285 | 0.2997 | 0.6939 | 0.8115 | 0.1276 | 0.8721 | 0.0001 |
102
+ | 0.139 | 46.0 | 12558 | 0.2959 | 0.6856 | 0.8103 | 0.1270 | 0.8700 | 0.0001 |
103
+ | 0.139 | 47.0 | 12831 | 0.2973 | 0.6943 | 0.8125 | 0.1269 | 0.8726 | 0.0001 |
104
+ | 0.1375 | 48.0 | 13104 | 0.2980 | 0.6942 | 0.8132 | 0.1262 | 0.8743 | 0.0001 |
105
+ | 0.1375 | 49.0 | 13377 | 0.2966 | 0.6956 | 0.8147 | 0.1263 | 0.8775 | 0.0001 |
106
+ | 0.1353 | 50.0 | 13650 | 0.2928 | 0.7007 | 0.8153 | 0.1258 | 0.8782 | 0.0001 |
107
+ | 0.1353 | 51.0 | 13923 | 0.2973 | 0.6995 | 0.8152 | 0.1257 | 0.8776 | 0.0001 |
108
+ | 0.1337 | 52.0 | 14196 | 0.2973 | 0.6975 | 0.8135 | 0.1250 | 0.8729 | 0.0001 |
109
+ | 0.1337 | 53.0 | 14469 | 0.2949 | 0.6962 | 0.8133 | 0.1248 | 0.8757 | 0.0001 |
110
+ | 0.1338 | 54.0 | 14742 | 0.3018 | 0.6981 | 0.8143 | 0.1247 | 0.8739 | 0.0001 |
111
+ | 0.1322 | 55.0 | 15015 | 0.3008 | 0.7020 | 0.8166 | 0.1245 | 0.8792 | 0.0001 |
112
+ | 0.1322 | 56.0 | 15288 | 0.3011 | 0.7041 | 0.8185 | 0.1244 | 0.8820 | 0.0001 |
113
+ | 0.1313 | 57.0 | 15561 | 0.3004 | 0.6984 | 0.8162 | 0.1239 | 0.8770 | 0.0001 |
114
+ | 0.1313 | 58.0 | 15834 | 0.3001 | 0.7041 | 0.8171 | 0.1236 | 0.8785 | 0.0001 |
115
+ | 0.1309 | 59.0 | 16107 | 0.3049 | 0.7019 | 0.8159 | 0.1237 | 0.8758 | 0.0001 |
116
+ | 0.1309 | 60.0 | 16380 | 0.2990 | 0.7008 | 0.8153 | 0.1234 | 0.8731 | 0.0001 |
117
+ | 0.13 | 61.0 | 16653 | 0.3025 | 0.7083 | 0.8189 | 0.1229 | 0.8791 | 0.0001 |
118
+ | 0.13 | 62.0 | 16926 | 0.3028 | 0.7055 | 0.8166 | 0.1227 | 0.8767 | 0.0001 |
119
+ | 0.1288 | 63.0 | 17199 | 0.3039 | 0.7106 | 0.8176 | 0.1230 | 0.8774 | 0.0001 |
120
+ | 0.1288 | 64.0 | 17472 | 0.3049 | 0.7086 | 0.8192 | 0.1233 | 0.8803 | 0.0001 |
121
+ | 0.1291 | 65.0 | 17745 | 0.3049 | 0.7104 | 0.8188 | 0.1231 | 0.8798 | 0.0001 |
122
+ | 0.1283 | 66.0 | 18018 | 0.3028 | 0.7061 | 0.8186 | 0.1219 | 0.8789 | 0.0001 |
123
+ | 0.1283 | 67.0 | 18291 | 0.3042 | 0.7155 | 0.8197 | 0.1229 | 0.8823 | 0.0001 |
124
+ | 0.1273 | 68.0 | 18564 | 0.3080 | 0.7153 | 0.8210 | 0.1225 | 0.8844 | 0.0001 |
125
+ | 0.1273 | 69.0 | 18837 | 0.3032 | 0.7102 | 0.8196 | 0.1222 | 0.8799 | 0.0001 |
126
+ | 0.1265 | 70.0 | 19110 | 0.3084 | 0.7109 | 0.8185 | 0.1223 | 0.8768 | 0.0001 |
127
+ | 0.1265 | 71.0 | 19383 | 0.3077 | 0.7120 | 0.8170 | 0.1224 | 0.8737 | 0.0001 |
128
+ | 0.1264 | 72.0 | 19656 | 0.3063 | 0.7204 | 0.8204 | 0.1221 | 0.8803 | 0.0001 |
129
+ | 0.1264 | 73.0 | 19929 | 0.3087 | 0.7144 | 0.8198 | 0.1217 | 0.8798 | 1e-05 |
130
+ | 0.1249 | 74.0 | 20202 | 0.3067 | 0.7124 | 0.8190 | 0.1215 | 0.8757 | 1e-05 |
131
+ | 0.1249 | 75.0 | 20475 | 0.3056 | 0.7145 | 0.8209 | 0.1212 | 0.8796 | 1e-05 |
132
+ | 0.1236 | 76.0 | 20748 | 0.3080 | 0.7191 | 0.8219 | 0.1216 | 0.8822 | 1e-05 |
133
+ | 0.1233 | 77.0 | 21021 | 0.3132 | 0.7203 | 0.8237 | 0.1214 | 0.8868 | 1e-05 |
134
+ | 0.1233 | 78.0 | 21294 | 0.3098 | 0.7168 | 0.8223 | 0.1211 | 0.8823 | 1e-05 |
135
+ | 0.123 | 79.0 | 21567 | 0.3067 | 0.7161 | 0.8203 | 0.1215 | 0.8783 | 1e-05 |
136
+ | 0.123 | 80.0 | 21840 | 0.3073 | 0.7151 | 0.8219 | 0.1216 | 0.8847 | 1e-05 |
137
+ | 0.123 | 81.0 | 22113 | 0.3115 | 0.7187 | 0.8216 | 0.1210 | 0.8808 | 1e-05 |
138
+ | 0.123 | 82.0 | 22386 | 0.3094 | 0.7157 | 0.8212 | 0.1208 | 0.8794 | 1e-05 |
139
+ | 0.1214 | 83.0 | 22659 | 0.3001 | 0.7102 | 0.8180 | 0.1215 | 0.8751 | 1e-05 |
140
+ | 0.1214 | 84.0 | 22932 | 0.3119 | 0.7196 | 0.8216 | 0.1210 | 0.8817 | 1e-05 |
141
+ | 0.1234 | 85.0 | 23205 | 0.3101 | 0.7201 | 0.8234 | 0.1208 | 0.8835 | 1e-05 |
142
+ | 0.1234 | 86.0 | 23478 | 0.1210 | 0.8218 | 0.7215 | 0.8813 | 0.3094 | 1e-05 |
143
+ | 0.1216 | 87.0 | 23751 | 0.1212 | 0.8207 | 0.7142 | 0.8796 | 0.3087 | 1e-05 |
144
+ | 0.1219 | 88.0 | 24024 | 0.1210 | 0.8224 | 0.7125 | 0.8824 | 0.3101 | 1e-05 |
145
+ | 0.1219 | 89.0 | 24297 | 0.1214 | 0.8241 | 0.7250 | 0.8876 | 0.3122 | 0.0000 |
146
+ | 0.1219 | 90.0 | 24570 | 0.1212 | 0.8234 | 0.7199 | 0.8864 | 0.3105 | 0.0000 |
147
+ | 0.1219 | 91.0 | 24843 | 0.1208 | 0.8212 | 0.7160 | 0.8790 | 0.3098 | 0.0000 |
148
+ | 0.1213 | 92.0 | 25116 | 0.1207 | 0.8224 | 0.7144 | 0.8807 | 0.3073 | 0.0000 |
149
+ | 0.1213 | 93.0 | 25389 | 0.1209 | 0.8227 | 0.7189 | 0.8834 | 0.3080 | 0.0000 |
150
+ | 0.122 | 94.0 | 25662 | 0.1209 | 0.8223 | 0.7188 | 0.8828 | 0.3098 | 0.0000 |
151
+ | 0.122 | 95.0 | 25935 | 0.1207 | 0.8222 | 0.7127 | 0.8807 | 0.3094 | 0.0000 |
152
+ | 0.1209 | 96.0 | 26208 | 0.1214 | 0.8218 | 0.7160 | 0.8821 | 0.3067 | 0.0000 |
153
+ | 0.1209 | 97.0 | 26481 | 0.1226 | 0.8209 | 0.7159 | 0.8793 | 0.3094 | 0.0000 |
154
+ | 0.122 | 98.0 | 26754 | 0.1210 | 0.8225 | 0.7190 | 0.8843 | 0.3119 | 0.0000 |
155
+ | 0.1218 | 99.0 | 27027 | 0.1208 | 0.8214 | 0.7177 | 0.8803 | 0.3098 | 0.0000 |
156
+ | 0.1218 | 100.0 | 27300 | 0.1208 | 0.8219 | 0.7191 | 0.8794 | 0.3108 | 0.0000 |
157
+ | 0.1222 | 101.0 | 27573 | 0.1207 | 0.8231 | 0.7199 | 0.8825 | 0.3098 | 0.0000 |
158
+ | 0.1222 | 102.0 | 27846 | 0.1210 | 0.8216 | 0.7181 | 0.8797 | 0.3101 | 0.0000 |
159
+ | 0.1212 | 103.0 | 28119 | 0.1207 | 0.8219 | 0.7156 | 0.8799 | 0.3112 | 0.0000 |
160
+ | 0.1212 | 104.0 | 28392 | 0.1212 | 0.8214 | 0.7151 | 0.8810 | 0.3091 | 0.0000 |
161
+ | 0.1204 | 105.0 | 28665 | 0.1208 | 0.8216 | 0.7175 | 0.8822 | 0.3084 | 0.0000 |
162
+
163
+
164
+ ### Framework versions
165
+
166
+ - Transformers 4.41.1
167
+ - Pytorch 2.3.0+cu121
168
+ - Datasets 2.19.1
169
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 105.0,
3
+ "eval_accuracy": 0.3079584775086505,
4
+ "eval_f1_macro": 0.7100502784895705,
5
+ "eval_f1_micro": 0.8209253855773239,
6
+ "eval_loss": 0.12079350650310516,
7
+ "eval_roc_auc": 0.881227166005117,
8
+ "eval_runtime": 747.5462,
9
+ "eval_samples_per_second": 3.866,
10
+ "eval_steps_per_second": 0.122,
11
+ "learning_rate": 1.0000000000000004e-08,
12
+ "total_flos": 5.049640374682393e+21,
13
+ "train_loss": 0.023157235795491324,
14
+ "train_runtime": 62002.1626,
15
+ "train_samples_per_second": 21.086,
16
+ "train_steps_per_second": 0.66
17
+ }
logs/events.out.tfevents.1725747475.datavisu2 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:86b0b3e48f9ea71be84bf7f868813cc988170b2d6d5840434e0df348eb3577b6
3
- size 18905
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb22e709a1b193c266697bb6c867060e9968976102a4a4aa4b39d960095cbcf9
3
+ size 21177
logs/events.out.tfevents.1725810610.datavisu2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef7a9b6072520a2b3166bc6e9a347078b768eee0fde086a19f59da9146e0cd2c
3
+ size 40
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:97f320ba862cacdc1aecf6d17be76f8394bc548bc399367668aae8c0c576d39b
3
  size 4569746364
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7b99ad31a1fcbff7adcc01bc35e0b91b11e135ba3ec83c49beda93dd7e21f09
3
  size 4569746364
test_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 105.0,
3
+ "eval_accuracy": 0.3079584775086505,
4
+ "eval_f1_macro": 0.7100502784895705,
5
+ "eval_f1_micro": 0.8209253855773239,
6
+ "eval_loss": 0.12079350650310516,
7
+ "eval_roc_auc": 0.881227166005117,
8
+ "eval_runtime": 747.5462,
9
+ "eval_samples_per_second": 3.866,
10
+ "eval_steps_per_second": 0.122,
11
+ "learning_rate": 1.0000000000000004e-08
12
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 105.0,
3
+ "learning_rate": 1.0000000000000004e-08,
4
+ "total_flos": 5.049640374682393e+21,
5
+ "train_loss": 0.023157235795491324,
6
+ "train_runtime": 62002.1626,
7
+ "train_samples_per_second": 21.086,
8
+ "train_steps_per_second": 0.66
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1816 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.1206900030374527,
3
+ "best_model_checkpoint": "/home/datawork-iot-nos/Seatizen/models/multilabel/fine_scale/DinoVdeau-giant-2024_08_28-batch-size32_epochs150_freeze/checkpoint-25935",
4
+ "epoch": 105.0,
5
+ "eval_steps": 500,
6
+ "global_step": 28665,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 1.0,
13
+ "eval_accuracy": 0.21205821205821207,
14
+ "eval_f1_macro": 0.5175126673232894,
15
+ "eval_f1_micro": 0.7424333879451582,
16
+ "eval_loss": 0.17437300086021423,
17
+ "eval_roc_auc": 0.8285535192873753,
18
+ "eval_runtime": 747.1492,
19
+ "eval_samples_per_second": 3.863,
20
+ "eval_steps_per_second": 0.122,
21
+ "learning_rate": 0.001,
22
+ "step": 273
23
+ },
24
+ {
25
+ "epoch": 1.8315018315018317,
26
+ "grad_norm": 0.29891085624694824,
27
+ "learning_rate": 0.001,
28
+ "loss": 0.2593,
29
+ "step": 500
30
+ },
31
+ {
32
+ "epoch": 2.0,
33
+ "eval_accuracy": 0.24774774774774774,
34
+ "eval_f1_macro": 0.5912510936495889,
35
+ "eval_f1_micro": 0.7776526996039191,
36
+ "eval_loss": 0.1514047533273697,
37
+ "eval_roc_auc": 0.856455760350861,
38
+ "eval_runtime": 745.2688,
39
+ "eval_samples_per_second": 3.872,
40
+ "eval_steps_per_second": 0.122,
41
+ "learning_rate": 0.001,
42
+ "step": 546
43
+ },
44
+ {
45
+ "epoch": 3.0,
46
+ "eval_accuracy": 0.23873873873873874,
47
+ "eval_f1_macro": 0.6203462640123141,
48
+ "eval_f1_micro": 0.7752795082305376,
49
+ "eval_loss": 0.1557399332523346,
50
+ "eval_roc_auc": 0.8580342914691714,
51
+ "eval_runtime": 748.2805,
52
+ "eval_samples_per_second": 3.857,
53
+ "eval_steps_per_second": 0.122,
54
+ "learning_rate": 0.001,
55
+ "step": 819
56
+ },
57
+ {
58
+ "epoch": 3.663003663003663,
59
+ "grad_norm": 0.24181818962097168,
60
+ "learning_rate": 0.001,
61
+ "loss": 0.1694,
62
+ "step": 1000
63
+ },
64
+ {
65
+ "epoch": 4.0,
66
+ "eval_accuracy": 0.2494802494802495,
67
+ "eval_f1_macro": 0.6112936548561337,
68
+ "eval_f1_micro": 0.7691087713115115,
69
+ "eval_loss": 0.1499096304178238,
70
+ "eval_roc_auc": 0.8372664798756062,
71
+ "eval_runtime": 747.4138,
72
+ "eval_samples_per_second": 3.861,
73
+ "eval_steps_per_second": 0.122,
74
+ "learning_rate": 0.001,
75
+ "step": 1092
76
+ },
77
+ {
78
+ "epoch": 5.0,
79
+ "eval_accuracy": 0.24497574497574498,
80
+ "eval_f1_macro": 0.6316545255681125,
81
+ "eval_f1_micro": 0.7744962975718961,
82
+ "eval_loss": 0.15773828327655792,
83
+ "eval_roc_auc": 0.8461026726645842,
84
+ "eval_runtime": 747.0386,
85
+ "eval_samples_per_second": 3.863,
86
+ "eval_steps_per_second": 0.122,
87
+ "learning_rate": 0.001,
88
+ "step": 1365
89
+ },
90
+ {
91
+ "epoch": 5.4945054945054945,
92
+ "grad_norm": 0.17729038000106812,
93
+ "learning_rate": 0.001,
94
+ "loss": 0.1637,
95
+ "step": 1500
96
+ },
97
+ {
98
+ "epoch": 6.0,
99
+ "eval_accuracy": 0.25744975744975745,
100
+ "eval_f1_macro": 0.6220908262048482,
101
+ "eval_f1_micro": 0.7803354441211706,
102
+ "eval_loss": 0.1529887616634369,
103
+ "eval_roc_auc": 0.8508892919574323,
104
+ "eval_runtime": 747.6468,
105
+ "eval_samples_per_second": 3.86,
106
+ "eval_steps_per_second": 0.122,
107
+ "learning_rate": 0.001,
108
+ "step": 1638
109
+ },
110
+ {
111
+ "epoch": 7.0,
112
+ "eval_accuracy": 0.2616077616077616,
113
+ "eval_f1_macro": 0.6318272608971183,
114
+ "eval_f1_micro": 0.7837652308220353,
115
+ "eval_loss": 0.14232446253299713,
116
+ "eval_roc_auc": 0.8519980061789139,
117
+ "eval_runtime": 743.8547,
118
+ "eval_samples_per_second": 3.88,
119
+ "eval_steps_per_second": 0.122,
120
+ "learning_rate": 0.001,
121
+ "step": 1911
122
+ },
123
+ {
124
+ "epoch": 7.326007326007326,
125
+ "grad_norm": 0.21456240117549896,
126
+ "learning_rate": 0.001,
127
+ "loss": 0.1598,
128
+ "step": 2000
129
+ },
130
+ {
131
+ "epoch": 8.0,
132
+ "eval_accuracy": 0.2591822591822592,
133
+ "eval_f1_macro": 0.6268140575796306,
134
+ "eval_f1_micro": 0.7824785045129828,
135
+ "eval_loss": 0.14342056214809418,
136
+ "eval_roc_auc": 0.8521029956678926,
137
+ "eval_runtime": 745.5826,
138
+ "eval_samples_per_second": 3.871,
139
+ "eval_steps_per_second": 0.122,
140
+ "learning_rate": 0.001,
141
+ "step": 2184
142
+ },
143
+ {
144
+ "epoch": 9.0,
145
+ "eval_accuracy": 0.25848925848925847,
146
+ "eval_f1_macro": 0.6406683603322132,
147
+ "eval_f1_micro": 0.7840562521179261,
148
+ "eval_loss": 0.14322087168693542,
149
+ "eval_roc_auc": 0.8556312702614824,
150
+ "eval_runtime": 746.6555,
151
+ "eval_samples_per_second": 3.865,
152
+ "eval_steps_per_second": 0.122,
153
+ "learning_rate": 0.001,
154
+ "step": 2457
155
+ },
156
+ {
157
+ "epoch": 9.157509157509157,
158
+ "grad_norm": 0.17193137109279633,
159
+ "learning_rate": 0.001,
160
+ "loss": 0.157,
161
+ "step": 2500
162
+ },
163
+ {
164
+ "epoch": 10.0,
165
+ "eval_accuracy": 0.2591822591822592,
166
+ "eval_f1_macro": 0.6350156993693012,
167
+ "eval_f1_micro": 0.7779440239394473,
168
+ "eval_loss": 0.15065954625606537,
169
+ "eval_roc_auc": 0.8421810798397646,
170
+ "eval_runtime": 749.2424,
171
+ "eval_samples_per_second": 3.852,
172
+ "eval_steps_per_second": 0.121,
173
+ "learning_rate": 0.001,
174
+ "step": 2730
175
+ },
176
+ {
177
+ "epoch": 10.989010989010989,
178
+ "grad_norm": 0.17156100273132324,
179
+ "learning_rate": 0.001,
180
+ "loss": 0.1564,
181
+ "step": 3000
182
+ },
183
+ {
184
+ "epoch": 11.0,
185
+ "eval_accuracy": 0.26853776853776856,
186
+ "eval_f1_macro": 0.6442254017268965,
187
+ "eval_f1_micro": 0.7905542412977358,
188
+ "eval_loss": 0.14012028276920319,
189
+ "eval_roc_auc": 0.8599228950325096,
190
+ "eval_runtime": 743.9581,
191
+ "eval_samples_per_second": 3.879,
192
+ "eval_steps_per_second": 0.122,
193
+ "learning_rate": 0.001,
194
+ "step": 3003
195
+ },
196
+ {
197
+ "epoch": 12.0,
198
+ "eval_accuracy": 0.26056826056826055,
199
+ "eval_f1_macro": 0.6412994039301575,
200
+ "eval_f1_micro": 0.7896027049873203,
201
+ "eval_loss": 0.14037516713142395,
202
+ "eval_roc_auc": 0.8592624552114599,
203
+ "eval_runtime": 747.0487,
204
+ "eval_samples_per_second": 3.863,
205
+ "eval_steps_per_second": 0.122,
206
+ "learning_rate": 0.001,
207
+ "step": 3276
208
+ },
209
+ {
210
+ "epoch": 12.820512820512821,
211
+ "grad_norm": 0.14995847642421722,
212
+ "learning_rate": 0.001,
213
+ "loss": 0.1556,
214
+ "step": 3500
215
+ },
216
+ {
217
+ "epoch": 13.0,
218
+ "eval_accuracy": 0.2695772695772696,
219
+ "eval_f1_macro": 0.6359393136512833,
220
+ "eval_f1_micro": 0.7822141560798549,
221
+ "eval_loss": 0.1420680731534958,
222
+ "eval_roc_auc": 0.8492469381754499,
223
+ "eval_runtime": 742.4635,
224
+ "eval_samples_per_second": 3.887,
225
+ "eval_steps_per_second": 0.123,
226
+ "learning_rate": 0.001,
227
+ "step": 3549
228
+ },
229
+ {
230
+ "epoch": 14.0,
231
+ "eval_accuracy": 0.2636867636867637,
232
+ "eval_f1_macro": 0.6459907944955716,
233
+ "eval_f1_micro": 0.7887275978034142,
234
+ "eval_loss": 0.13944004476070404,
235
+ "eval_roc_auc": 0.8568078446879906,
236
+ "eval_runtime": 744.9297,
237
+ "eval_samples_per_second": 3.874,
238
+ "eval_steps_per_second": 0.122,
239
+ "learning_rate": 0.001,
240
+ "step": 3822
241
+ },
242
+ {
243
+ "epoch": 14.652014652014651,
244
+ "grad_norm": 0.1688154637813568,
245
+ "learning_rate": 0.001,
246
+ "loss": 0.1547,
247
+ "step": 4000
248
+ },
249
+ {
250
+ "epoch": 15.0,
251
+ "eval_accuracy": 0.2553707553707554,
252
+ "eval_f1_macro": 0.6554204386045119,
253
+ "eval_f1_micro": 0.7915315007683115,
254
+ "eval_loss": 0.13796783983707428,
255
+ "eval_roc_auc": 0.8575869560318454,
256
+ "eval_runtime": 749.7594,
257
+ "eval_samples_per_second": 3.849,
258
+ "eval_steps_per_second": 0.121,
259
+ "learning_rate": 0.001,
260
+ "step": 4095
261
+ },
262
+ {
263
+ "epoch": 16.0,
264
+ "eval_accuracy": 0.255024255024255,
265
+ "eval_f1_macro": 0.6452554527968026,
266
+ "eval_f1_micro": 0.7857792404624779,
267
+ "eval_loss": 0.1441228836774826,
268
+ "eval_roc_auc": 0.8505811645074093,
269
+ "eval_runtime": 751.9487,
270
+ "eval_samples_per_second": 3.838,
271
+ "eval_steps_per_second": 0.121,
272
+ "learning_rate": 0.001,
273
+ "step": 4368
274
+ },
275
+ {
276
+ "epoch": 16.483516483516482,
277
+ "grad_norm": 0.15101341903209686,
278
+ "learning_rate": 0.001,
279
+ "loss": 0.1539,
280
+ "step": 4500
281
+ },
282
+ {
283
+ "epoch": 17.0,
284
+ "eval_accuracy": 0.26784476784476785,
285
+ "eval_f1_macro": 0.6485416937632181,
286
+ "eval_f1_micro": 0.7904489177124567,
287
+ "eval_loss": 0.14113685488700867,
288
+ "eval_roc_auc": 0.8607338640531657,
289
+ "eval_runtime": 751.954,
290
+ "eval_samples_per_second": 3.838,
291
+ "eval_steps_per_second": 0.121,
292
+ "learning_rate": 0.001,
293
+ "step": 4641
294
+ },
295
+ {
296
+ "epoch": 18.0,
297
+ "eval_accuracy": 0.26056826056826055,
298
+ "eval_f1_macro": 0.654854199500387,
299
+ "eval_f1_micro": 0.7940517933336151,
300
+ "eval_loss": 0.1381485015153885,
301
+ "eval_roc_auc": 0.8618218271900107,
302
+ "eval_runtime": 756.1006,
303
+ "eval_samples_per_second": 3.817,
304
+ "eval_steps_per_second": 0.12,
305
+ "learning_rate": 0.001,
306
+ "step": 4914
307
+ },
308
+ {
309
+ "epoch": 18.315018315018314,
310
+ "grad_norm": 0.17647762596607208,
311
+ "learning_rate": 0.001,
312
+ "loss": 0.1552,
313
+ "step": 5000
314
+ },
315
+ {
316
+ "epoch": 19.0,
317
+ "eval_accuracy": 0.2654192654192654,
318
+ "eval_f1_macro": 0.6522812524843972,
319
+ "eval_f1_micro": 0.793669650812508,
320
+ "eval_loss": 0.13720253109931946,
321
+ "eval_roc_auc": 0.8604083523719281,
322
+ "eval_runtime": 753.1197,
323
+ "eval_samples_per_second": 3.832,
324
+ "eval_steps_per_second": 0.121,
325
+ "learning_rate": 0.001,
326
+ "step": 5187
327
+ },
328
+ {
329
+ "epoch": 20.0,
330
+ "eval_accuracy": 0.253984753984754,
331
+ "eval_f1_macro": 0.6515497507659908,
332
+ "eval_f1_micro": 0.791502353390154,
333
+ "eval_loss": 0.13964051008224487,
334
+ "eval_roc_auc": 0.8593941380801585,
335
+ "eval_runtime": 760.0428,
336
+ "eval_samples_per_second": 3.797,
337
+ "eval_steps_per_second": 0.12,
338
+ "learning_rate": 0.001,
339
+ "step": 5460
340
+ },
341
+ {
342
+ "epoch": 20.146520146520146,
343
+ "grad_norm": 0.15846939384937286,
344
+ "learning_rate": 0.001,
345
+ "loss": 0.1531,
346
+ "step": 5500
347
+ },
348
+ {
349
+ "epoch": 21.0,
350
+ "eval_accuracy": 0.2577962577962578,
351
+ "eval_f1_macro": 0.6542904488686327,
352
+ "eval_f1_micro": 0.7925025501530093,
353
+ "eval_loss": 0.13785456120967865,
354
+ "eval_roc_auc": 0.8592903826569759,
355
+ "eval_runtime": 757.5213,
356
+ "eval_samples_per_second": 3.81,
357
+ "eval_steps_per_second": 0.12,
358
+ "learning_rate": 0.001,
359
+ "step": 5733
360
+ },
361
+ {
362
+ "epoch": 21.978021978021978,
363
+ "grad_norm": 0.16983690857887268,
364
+ "learning_rate": 0.001,
365
+ "loss": 0.1536,
366
+ "step": 6000
367
+ },
368
+ {
369
+ "epoch": 22.0,
370
+ "eval_accuracy": 0.2661122661122661,
371
+ "eval_f1_macro": 0.6524154901292529,
372
+ "eval_f1_micro": 0.7952276188864443,
373
+ "eval_loss": 0.13633865118026733,
374
+ "eval_roc_auc": 0.8620495257431491,
375
+ "eval_runtime": 758.4735,
376
+ "eval_samples_per_second": 3.805,
377
+ "eval_steps_per_second": 0.12,
378
+ "learning_rate": 0.001,
379
+ "step": 6006
380
+ },
381
+ {
382
+ "epoch": 23.0,
383
+ "eval_accuracy": 0.27096327096327094,
384
+ "eval_f1_macro": 0.656651787807274,
385
+ "eval_f1_micro": 0.7961679924728424,
386
+ "eval_loss": 0.13627886772155762,
387
+ "eval_roc_auc": 0.8595478597543244,
388
+ "eval_runtime": 753.7633,
389
+ "eval_samples_per_second": 3.829,
390
+ "eval_steps_per_second": 0.121,
391
+ "learning_rate": 0.001,
392
+ "step": 6279
393
+ },
394
+ {
395
+ "epoch": 23.80952380952381,
396
+ "grad_norm": 0.1691550612449646,
397
+ "learning_rate": 0.001,
398
+ "loss": 0.1535,
399
+ "step": 6500
400
+ },
401
+ {
402
+ "epoch": 24.0,
403
+ "eval_accuracy": 0.2661122661122661,
404
+ "eval_f1_macro": 0.6438900918479138,
405
+ "eval_f1_micro": 0.7871861324722778,
406
+ "eval_loss": 0.14012865722179413,
407
+ "eval_roc_auc": 0.8565085837324373,
408
+ "eval_runtime": 758.2383,
409
+ "eval_samples_per_second": 3.806,
410
+ "eval_steps_per_second": 0.12,
411
+ "learning_rate": 0.001,
412
+ "step": 6552
413
+ },
414
+ {
415
+ "epoch": 25.0,
416
+ "eval_accuracy": 0.27546777546777546,
417
+ "eval_f1_macro": 0.6538094573584412,
418
+ "eval_f1_micro": 0.7960565795113589,
419
+ "eval_loss": 0.1359640210866928,
420
+ "eval_roc_auc": 0.8588707063899927,
421
+ "eval_runtime": 765.0178,
422
+ "eval_samples_per_second": 3.772,
423
+ "eval_steps_per_second": 0.119,
424
+ "learning_rate": 0.001,
425
+ "step": 6825
426
+ },
427
+ {
428
+ "epoch": 25.641025641025642,
429
+ "grad_norm": 0.14603881537914276,
430
+ "learning_rate": 0.001,
431
+ "loss": 0.153,
432
+ "step": 7000
433
+ },
434
+ {
435
+ "epoch": 26.0,
436
+ "eval_accuracy": 0.2692307692307692,
437
+ "eval_f1_macro": 0.6407905722004358,
438
+ "eval_f1_micro": 0.7942222975262623,
439
+ "eval_loss": 0.1370791494846344,
440
+ "eval_roc_auc": 0.8611700794845683,
441
+ "eval_runtime": 750.2435,
442
+ "eval_samples_per_second": 3.847,
443
+ "eval_steps_per_second": 0.121,
444
+ "learning_rate": 0.001,
445
+ "step": 7098
446
+ },
447
+ {
448
+ "epoch": 27.0,
449
+ "eval_accuracy": 0.2654192654192654,
450
+ "eval_f1_macro": 0.6469565906332285,
451
+ "eval_f1_micro": 0.7902460077686664,
452
+ "eval_loss": 0.13669614493846893,
453
+ "eval_roc_auc": 0.8538650806596136,
454
+ "eval_runtime": 744.2164,
455
+ "eval_samples_per_second": 3.878,
456
+ "eval_steps_per_second": 0.122,
457
+ "learning_rate": 0.001,
458
+ "step": 7371
459
+ },
460
+ {
461
+ "epoch": 27.47252747252747,
462
+ "grad_norm": 0.1542704999446869,
463
+ "learning_rate": 0.001,
464
+ "loss": 0.1532,
465
+ "step": 7500
466
+ },
467
+ {
468
+ "epoch": 28.0,
469
+ "eval_accuracy": 0.26888426888426886,
470
+ "eval_f1_macro": 0.642689033704319,
471
+ "eval_f1_micro": 0.7912144926283021,
472
+ "eval_loss": 0.1371130496263504,
473
+ "eval_roc_auc": 0.8539010295328042,
474
+ "eval_runtime": 744.0106,
475
+ "eval_samples_per_second": 3.879,
476
+ "eval_steps_per_second": 0.122,
477
+ "learning_rate": 0.001,
478
+ "step": 7644
479
+ },
480
+ {
481
+ "epoch": 29.0,
482
+ "eval_accuracy": 0.2692307692307692,
483
+ "eval_f1_macro": 0.6484600603294314,
484
+ "eval_f1_micro": 0.7944120277694962,
485
+ "eval_loss": 0.13781629502773285,
486
+ "eval_roc_auc": 0.8597476308619466,
487
+ "eval_runtime": 751.4281,
488
+ "eval_samples_per_second": 3.841,
489
+ "eval_steps_per_second": 0.121,
490
+ "learning_rate": 0.001,
491
+ "step": 7917
492
+ },
493
+ {
494
+ "epoch": 29.304029304029303,
495
+ "grad_norm": 0.15774671733379364,
496
+ "learning_rate": 0.001,
497
+ "loss": 0.1539,
498
+ "step": 8000
499
+ },
500
+ {
501
+ "epoch": 30.0,
502
+ "eval_accuracy": 0.26507276507276506,
503
+ "eval_f1_macro": 0.6472439075890195,
504
+ "eval_f1_micro": 0.7938241064573914,
505
+ "eval_loss": 0.13641151785850525,
506
+ "eval_roc_auc": 0.8590391831986771,
507
+ "eval_runtime": 743.8204,
508
+ "eval_samples_per_second": 3.88,
509
+ "eval_steps_per_second": 0.122,
510
+ "learning_rate": 0.001,
511
+ "step": 8190
512
+ },
513
+ {
514
+ "epoch": 31.0,
515
+ "eval_accuracy": 0.2747747747747748,
516
+ "eval_f1_macro": 0.6533472550118105,
517
+ "eval_f1_micro": 0.7999161777032691,
518
+ "eval_loss": 0.13565559685230255,
519
+ "eval_roc_auc": 0.8672849828142924,
520
+ "eval_runtime": 745.046,
521
+ "eval_samples_per_second": 3.874,
522
+ "eval_steps_per_second": 0.122,
523
+ "learning_rate": 0.001,
524
+ "step": 8463
525
+ },
526
+ {
527
+ "epoch": 31.135531135531135,
528
+ "grad_norm": 0.15824691951274872,
529
+ "learning_rate": 0.001,
530
+ "loss": 0.1527,
531
+ "step": 8500
532
+ },
533
+ {
534
+ "epoch": 32.0,
535
+ "eval_accuracy": 0.2664587664587665,
536
+ "eval_f1_macro": 0.662032330499469,
537
+ "eval_f1_micro": 0.7928646379853095,
538
+ "eval_loss": 0.137930765748024,
539
+ "eval_roc_auc": 0.8629893205019107,
540
+ "eval_runtime": 747.6199,
541
+ "eval_samples_per_second": 3.86,
542
+ "eval_steps_per_second": 0.122,
543
+ "learning_rate": 0.001,
544
+ "step": 8736
545
+ },
546
+ {
547
+ "epoch": 32.967032967032964,
548
+ "grad_norm": 0.17653779685497284,
549
+ "learning_rate": 0.001,
550
+ "loss": 0.1524,
551
+ "step": 9000
552
+ },
553
+ {
554
+ "epoch": 33.0,
555
+ "eval_accuracy": 0.273042273042273,
556
+ "eval_f1_macro": 0.6722007856831675,
557
+ "eval_f1_micro": 0.7989514185446704,
558
+ "eval_loss": 0.13557712733745575,
559
+ "eval_roc_auc": 0.8642597778252326,
560
+ "eval_runtime": 743.3529,
561
+ "eval_samples_per_second": 3.882,
562
+ "eval_steps_per_second": 0.122,
563
+ "learning_rate": 0.001,
564
+ "step": 9009
565
+ },
566
+ {
567
+ "epoch": 34.0,
568
+ "eval_accuracy": 0.273042273042273,
569
+ "eval_f1_macro": 0.670590685863264,
570
+ "eval_f1_micro": 0.7966670917825107,
571
+ "eval_loss": 0.1347290426492691,
572
+ "eval_roc_auc": 0.8614922779674185,
573
+ "eval_runtime": 743.2445,
574
+ "eval_samples_per_second": 3.883,
575
+ "eval_steps_per_second": 0.122,
576
+ "learning_rate": 0.001,
577
+ "step": 9282
578
+ },
579
+ {
580
+ "epoch": 34.798534798534796,
581
+ "grad_norm": 0.15610426664352417,
582
+ "learning_rate": 0.001,
583
+ "loss": 0.1516,
584
+ "step": 9500
585
+ },
586
+ {
587
+ "epoch": 35.0,
588
+ "eval_accuracy": 0.2772002772002772,
589
+ "eval_f1_macro": 0.6482708127714739,
590
+ "eval_f1_micro": 0.7946646145953571,
591
+ "eval_loss": 0.13544337451457977,
592
+ "eval_roc_auc": 0.8588431142884431,
593
+ "eval_runtime": 750.5786,
594
+ "eval_samples_per_second": 3.845,
595
+ "eval_steps_per_second": 0.121,
596
+ "learning_rate": 0.001,
597
+ "step": 9555
598
+ },
599
+ {
600
+ "epoch": 36.0,
601
+ "eval_accuracy": 0.25848925848925847,
602
+ "eval_f1_macro": 0.6552995006011981,
603
+ "eval_f1_micro": 0.7927604900328681,
604
+ "eval_loss": 0.13763058185577393,
605
+ "eval_roc_auc": 0.8582396561141522,
606
+ "eval_runtime": 746.9319,
607
+ "eval_samples_per_second": 3.864,
608
+ "eval_steps_per_second": 0.122,
609
+ "learning_rate": 0.001,
610
+ "step": 9828
611
+ },
612
+ {
613
+ "epoch": 36.63003663003663,
614
+ "grad_norm": 0.176735520362854,
615
+ "learning_rate": 0.001,
616
+ "loss": 0.1527,
617
+ "step": 10000
618
+ },
619
+ {
620
+ "epoch": 37.0,
621
+ "eval_accuracy": 0.2747747747747748,
622
+ "eval_f1_macro": 0.6680976075122991,
623
+ "eval_f1_micro": 0.7992204380799051,
624
+ "eval_loss": 0.13456694781780243,
625
+ "eval_roc_auc": 0.8638335422302681,
626
+ "eval_runtime": 744.024,
627
+ "eval_samples_per_second": 3.879,
628
+ "eval_steps_per_second": 0.122,
629
+ "learning_rate": 0.001,
630
+ "step": 10101
631
+ },
632
+ {
633
+ "epoch": 38.0,
634
+ "eval_accuracy": 0.27165627165627165,
635
+ "eval_f1_macro": 0.6543467314054483,
636
+ "eval_f1_micro": 0.7889066758966815,
637
+ "eval_loss": 0.13784632086753845,
638
+ "eval_roc_auc": 0.8524819477636044,
639
+ "eval_runtime": 745.3518,
640
+ "eval_samples_per_second": 3.872,
641
+ "eval_steps_per_second": 0.122,
642
+ "learning_rate": 0.001,
643
+ "step": 10374
644
+ },
645
+ {
646
+ "epoch": 38.46153846153846,
647
+ "grad_norm": 0.16059936583042145,
648
+ "learning_rate": 0.001,
649
+ "loss": 0.1503,
650
+ "step": 10500
651
+ },
652
+ {
653
+ "epoch": 39.0,
654
+ "eval_accuracy": 0.2664587664587665,
655
+ "eval_f1_macro": 0.6627442989440849,
656
+ "eval_f1_micro": 0.7965357098029371,
657
+ "eval_loss": 0.13671767711639404,
658
+ "eval_roc_auc": 0.865910488378856,
659
+ "eval_runtime": 745.9061,
660
+ "eval_samples_per_second": 3.869,
661
+ "eval_steps_per_second": 0.122,
662
+ "learning_rate": 0.001,
663
+ "step": 10647
664
+ },
665
+ {
666
+ "epoch": 40.0,
667
+ "eval_accuracy": 0.27373527373527373,
668
+ "eval_f1_macro": 0.670153584497431,
669
+ "eval_f1_micro": 0.8004978220286246,
670
+ "eval_loss": 0.13730555772781372,
671
+ "eval_roc_auc": 0.8705375510125241,
672
+ "eval_runtime": 744.6796,
673
+ "eval_samples_per_second": 3.875,
674
+ "eval_steps_per_second": 0.122,
675
+ "learning_rate": 0.001,
676
+ "step": 10920
677
+ },
678
+ {
679
+ "epoch": 40.29304029304029,
680
+ "grad_norm": 0.16920654475688934,
681
+ "learning_rate": 0.001,
682
+ "loss": 0.152,
683
+ "step": 11000
684
+ },
685
+ {
686
+ "epoch": 41.0,
687
+ "eval_accuracy": 0.26576576576576577,
688
+ "eval_f1_macro": 0.6610276871242879,
689
+ "eval_f1_micro": 0.7942296990711015,
690
+ "eval_loss": 0.13770104944705963,
691
+ "eval_roc_auc": 0.8582536198369102,
692
+ "eval_runtime": 744.9969,
693
+ "eval_samples_per_second": 3.874,
694
+ "eval_steps_per_second": 0.122,
695
+ "learning_rate": 0.001,
696
+ "step": 11193
697
+ },
698
+ {
699
+ "epoch": 42.0,
700
+ "eval_accuracy": 0.28101178101178104,
701
+ "eval_f1_macro": 0.6705886094654014,
702
+ "eval_f1_micro": 0.8001525876319246,
703
+ "eval_loss": 0.13536451756954193,
704
+ "eval_roc_auc": 0.8642216961644161,
705
+ "eval_runtime": 751.3727,
706
+ "eval_samples_per_second": 3.841,
707
+ "eval_steps_per_second": 0.121,
708
+ "learning_rate": 0.001,
709
+ "step": 11466
710
+ },
711
+ {
712
+ "epoch": 42.124542124542124,
713
+ "grad_norm": 0.1676277071237564,
714
+ "learning_rate": 0.001,
715
+ "loss": 0.1515,
716
+ "step": 11500
717
+ },
718
+ {
719
+ "epoch": 43.0,
720
+ "eval_accuracy": 0.26507276507276506,
721
+ "eval_f1_macro": 0.6619628883017729,
722
+ "eval_f1_micro": 0.8000498525196295,
723
+ "eval_loss": 0.13665379583835602,
724
+ "eval_roc_auc": 0.8698817657657271,
725
+ "eval_runtime": 749.8198,
726
+ "eval_samples_per_second": 3.849,
727
+ "eval_steps_per_second": 0.121,
728
+ "learning_rate": 0.001,
729
+ "step": 11739
730
+ },
731
+ {
732
+ "epoch": 43.956043956043956,
733
+ "grad_norm": 0.15791508555412292,
734
+ "learning_rate": 0.0001,
735
+ "loss": 0.147,
736
+ "step": 12000
737
+ },
738
+ {
739
+ "epoch": 44.0,
740
+ "eval_accuracy": 0.2869022869022869,
741
+ "eval_f1_macro": 0.6825865030851337,
742
+ "eval_f1_micro": 0.808658516161447,
743
+ "eval_loss": 0.12908011674880981,
744
+ "eval_roc_auc": 0.8723907154255005,
745
+ "eval_runtime": 750.2309,
746
+ "eval_samples_per_second": 3.847,
747
+ "eval_steps_per_second": 0.121,
748
+ "learning_rate": 0.0001,
749
+ "step": 12012
750
+ },
751
+ {
752
+ "epoch": 45.0,
753
+ "eval_accuracy": 0.29972279972279975,
754
+ "eval_f1_macro": 0.6938587241702103,
755
+ "eval_f1_micro": 0.811512367788968,
756
+ "eval_loss": 0.12761357426643372,
757
+ "eval_roc_auc": 0.8720936945676423,
758
+ "eval_runtime": 758.8984,
759
+ "eval_samples_per_second": 3.803,
760
+ "eval_steps_per_second": 0.12,
761
+ "learning_rate": 0.0001,
762
+ "step": 12285
763
+ },
764
+ {
765
+ "epoch": 45.78754578754579,
766
+ "grad_norm": 0.16074201464653015,
767
+ "learning_rate": 0.0001,
768
+ "loss": 0.139,
769
+ "step": 12500
770
+ },
771
+ {
772
+ "epoch": 46.0,
773
+ "eval_accuracy": 0.2959112959112959,
774
+ "eval_f1_macro": 0.6856377454961721,
775
+ "eval_f1_micro": 0.8103163511624953,
776
+ "eval_loss": 0.12698666751384735,
777
+ "eval_roc_auc": 0.8699996458767716,
778
+ "eval_runtime": 752.5715,
779
+ "eval_samples_per_second": 3.835,
780
+ "eval_steps_per_second": 0.121,
781
+ "learning_rate": 0.0001,
782
+ "step": 12558
783
+ },
784
+ {
785
+ "epoch": 47.0,
786
+ "eval_accuracy": 0.2972972972972973,
787
+ "eval_f1_macro": 0.6942647446672258,
788
+ "eval_f1_micro": 0.8124920976103174,
789
+ "eval_loss": 0.12690682709217072,
790
+ "eval_roc_auc": 0.8725812846946867,
791
+ "eval_runtime": 759.108,
792
+ "eval_samples_per_second": 3.802,
793
+ "eval_steps_per_second": 0.12,
794
+ "learning_rate": 0.0001,
795
+ "step": 12831
796
+ },
797
+ {
798
+ "epoch": 47.61904761904762,
799
+ "grad_norm": 0.17895784974098206,
800
+ "learning_rate": 0.0001,
801
+ "loss": 0.1375,
802
+ "step": 13000
803
+ },
804
+ {
805
+ "epoch": 48.0,
806
+ "eval_accuracy": 0.29799029799029797,
807
+ "eval_f1_macro": 0.694151320978192,
808
+ "eval_f1_micro": 0.8131711409395973,
809
+ "eval_loss": 0.12617328763008118,
810
+ "eval_roc_auc": 0.8743386078020858,
811
+ "eval_runtime": 767.7622,
812
+ "eval_samples_per_second": 3.759,
813
+ "eval_steps_per_second": 0.119,
814
+ "learning_rate": 0.0001,
815
+ "step": 13104
816
+ },
817
+ {
818
+ "epoch": 49.0,
819
+ "eval_accuracy": 0.2966042966042966,
820
+ "eval_f1_macro": 0.6956458198072734,
821
+ "eval_f1_micro": 0.8147346514047868,
822
+ "eval_loss": 0.1263018250465393,
823
+ "eval_roc_auc": 0.8774737921983433,
824
+ "eval_runtime": 752.7691,
825
+ "eval_samples_per_second": 3.834,
826
+ "eval_steps_per_second": 0.121,
827
+ "learning_rate": 0.0001,
828
+ "step": 13377
829
+ },
830
+ {
831
+ "epoch": 49.45054945054945,
832
+ "grad_norm": 0.22477330267429352,
833
+ "learning_rate": 0.0001,
834
+ "loss": 0.1353,
835
+ "step": 13500
836
+ },
837
+ {
838
+ "epoch": 50.0,
839
+ "eval_accuracy": 0.2927927927927928,
840
+ "eval_f1_macro": 0.7006577033751422,
841
+ "eval_f1_micro": 0.8153475224476222,
842
+ "eval_loss": 0.1258096992969513,
843
+ "eval_roc_auc": 0.8781952512075065,
844
+ "eval_runtime": 751.7275,
845
+ "eval_samples_per_second": 3.839,
846
+ "eval_steps_per_second": 0.121,
847
+ "learning_rate": 0.0001,
848
+ "step": 13650
849
+ },
850
+ {
851
+ "epoch": 51.0,
852
+ "eval_accuracy": 0.2972972972972973,
853
+ "eval_f1_macro": 0.6994505755010588,
854
+ "eval_f1_micro": 0.8151571934207786,
855
+ "eval_loss": 0.12573884427547455,
856
+ "eval_roc_auc": 0.8775850056713371,
857
+ "eval_runtime": 754.5773,
858
+ "eval_samples_per_second": 3.825,
859
+ "eval_steps_per_second": 0.121,
860
+ "learning_rate": 0.0001,
861
+ "step": 13923
862
+ },
863
+ {
864
+ "epoch": 51.282051282051285,
865
+ "grad_norm": 0.1825592815876007,
866
+ "learning_rate": 0.0001,
867
+ "loss": 0.1337,
868
+ "step": 14000
869
+ },
870
+ {
871
+ "epoch": 52.0,
872
+ "eval_accuracy": 0.2972972972972973,
873
+ "eval_f1_macro": 0.6974514657531053,
874
+ "eval_f1_micro": 0.8134649455833967,
875
+ "eval_loss": 0.12501972913742065,
876
+ "eval_roc_auc": 0.8728563740571469,
877
+ "eval_runtime": 748.8299,
878
+ "eval_samples_per_second": 3.854,
879
+ "eval_steps_per_second": 0.122,
880
+ "learning_rate": 0.0001,
881
+ "step": 14196
882
+ },
883
+ {
884
+ "epoch": 53.0,
885
+ "eval_accuracy": 0.2948717948717949,
886
+ "eval_f1_macro": 0.6962280886309719,
887
+ "eval_f1_micro": 0.8132960287301124,
888
+ "eval_loss": 0.12481856346130371,
889
+ "eval_roc_auc": 0.8757195542554345,
890
+ "eval_runtime": 754.8846,
891
+ "eval_samples_per_second": 3.823,
892
+ "eval_steps_per_second": 0.121,
893
+ "learning_rate": 0.0001,
894
+ "step": 14469
895
+ },
896
+ {
897
+ "epoch": 53.11355311355312,
898
+ "grad_norm": 0.16182786226272583,
899
+ "learning_rate": 0.0001,
900
+ "loss": 0.1338,
901
+ "step": 14500
902
+ },
903
+ {
904
+ "epoch": 54.0,
905
+ "eval_accuracy": 0.30180180180180183,
906
+ "eval_f1_macro": 0.6980743235485474,
907
+ "eval_f1_micro": 0.8143470573377115,
908
+ "eval_loss": 0.12473563104867935,
909
+ "eval_roc_auc": 0.8739288040614714,
910
+ "eval_runtime": 764.2531,
911
+ "eval_samples_per_second": 3.776,
912
+ "eval_steps_per_second": 0.119,
913
+ "learning_rate": 0.0001,
914
+ "step": 14742
915
+ },
916
+ {
917
+ "epoch": 54.94505494505494,
918
+ "grad_norm": 0.22775864601135254,
919
+ "learning_rate": 0.0001,
920
+ "loss": 0.1322,
921
+ "step": 15000
922
+ },
923
+ {
924
+ "epoch": 55.0,
925
+ "eval_accuracy": 0.30076230076230076,
926
+ "eval_f1_macro": 0.7020497284253308,
927
+ "eval_f1_micro": 0.8165587111775452,
928
+ "eval_loss": 0.12453257292509079,
929
+ "eval_roc_auc": 0.8792131676966645,
930
+ "eval_runtime": 758.6078,
931
+ "eval_samples_per_second": 3.804,
932
+ "eval_steps_per_second": 0.12,
933
+ "learning_rate": 0.0001,
934
+ "step": 15015
935
+ },
936
+ {
937
+ "epoch": 56.0,
938
+ "eval_accuracy": 0.3011088011088011,
939
+ "eval_f1_macro": 0.7041152638460181,
940
+ "eval_f1_micro": 0.8185497191939213,
941
+ "eval_loss": 0.12440259009599686,
942
+ "eval_roc_auc": 0.8819546448626913,
943
+ "eval_runtime": 755.48,
944
+ "eval_samples_per_second": 3.82,
945
+ "eval_steps_per_second": 0.12,
946
+ "learning_rate": 0.0001,
947
+ "step": 15288
948
+ },
949
+ {
950
+ "epoch": 56.776556776556774,
951
+ "grad_norm": 0.26265445351600647,
952
+ "learning_rate": 0.0001,
953
+ "loss": 0.1313,
954
+ "step": 15500
955
+ },
956
+ {
957
+ "epoch": 57.0,
958
+ "eval_accuracy": 0.3004158004158004,
959
+ "eval_f1_macro": 0.6984123654445143,
960
+ "eval_f1_micro": 0.8162207357859533,
961
+ "eval_loss": 0.12393573671579361,
962
+ "eval_roc_auc": 0.8770029692696153,
963
+ "eval_runtime": 749.3127,
964
+ "eval_samples_per_second": 3.852,
965
+ "eval_steps_per_second": 0.121,
966
+ "learning_rate": 0.0001,
967
+ "step": 15561
968
+ },
969
+ {
970
+ "epoch": 58.0,
971
+ "eval_accuracy": 0.30006930006930005,
972
+ "eval_f1_macro": 0.7041206694443728,
973
+ "eval_f1_micro": 0.8171478565179352,
974
+ "eval_loss": 0.12355069816112518,
975
+ "eval_roc_auc": 0.8785400518736873,
976
+ "eval_runtime": 751.5939,
977
+ "eval_samples_per_second": 3.84,
978
+ "eval_steps_per_second": 0.121,
979
+ "learning_rate": 0.0001,
980
+ "step": 15834
981
+ },
982
+ {
983
+ "epoch": 58.608058608058606,
984
+ "grad_norm": 0.19159354269504547,
985
+ "learning_rate": 0.0001,
986
+ "loss": 0.1309,
987
+ "step": 16000
988
+ },
989
+ {
990
+ "epoch": 59.0,
991
+ "eval_accuracy": 0.3049203049203049,
992
+ "eval_f1_macro": 0.701908769020469,
993
+ "eval_f1_micro": 0.8158932617269447,
994
+ "eval_loss": 0.1237163171172142,
995
+ "eval_roc_auc": 0.8757623441455382,
996
+ "eval_runtime": 749.4527,
997
+ "eval_samples_per_second": 3.851,
998
+ "eval_steps_per_second": 0.121,
999
+ "learning_rate": 0.0001,
1000
+ "step": 16107
1001
+ },
1002
+ {
1003
+ "epoch": 60.0,
1004
+ "eval_accuracy": 0.29902979902979904,
1005
+ "eval_f1_macro": 0.7008492179245241,
1006
+ "eval_f1_micro": 0.8152564590468943,
1007
+ "eval_loss": 0.12339853495359421,
1008
+ "eval_roc_auc": 0.8731348839280636,
1009
+ "eval_runtime": 748.7843,
1010
+ "eval_samples_per_second": 3.854,
1011
+ "eval_steps_per_second": 0.122,
1012
+ "learning_rate": 0.0001,
1013
+ "step": 16380
1014
+ },
1015
+ {
1016
+ "epoch": 60.43956043956044,
1017
+ "grad_norm": 0.19487616419792175,
1018
+ "learning_rate": 0.0001,
1019
+ "loss": 0.13,
1020
+ "step": 16500
1021
+ },
1022
+ {
1023
+ "epoch": 61.0,
1024
+ "eval_accuracy": 0.3024948024948025,
1025
+ "eval_f1_macro": 0.7083200505706103,
1026
+ "eval_f1_micro": 0.8188720173535793,
1027
+ "eval_loss": 0.12294851988554001,
1028
+ "eval_roc_auc": 0.8791109816832443,
1029
+ "eval_runtime": 752.7718,
1030
+ "eval_samples_per_second": 3.834,
1031
+ "eval_steps_per_second": 0.121,
1032
+ "learning_rate": 0.0001,
1033
+ "step": 16653
1034
+ },
1035
+ {
1036
+ "epoch": 62.0,
1037
+ "eval_accuracy": 0.30284130284130284,
1038
+ "eval_f1_macro": 0.7054890147149661,
1039
+ "eval_f1_micro": 0.8166017506386899,
1040
+ "eval_loss": 0.12270853668451309,
1041
+ "eval_roc_auc": 0.876682675540494,
1042
+ "eval_runtime": 746.0294,
1043
+ "eval_samples_per_second": 3.868,
1044
+ "eval_steps_per_second": 0.122,
1045
+ "learning_rate": 0.0001,
1046
+ "step": 16926
1047
+ },
1048
+ {
1049
+ "epoch": 62.27106227106227,
1050
+ "grad_norm": 0.20640559494495392,
1051
+ "learning_rate": 0.0001,
1052
+ "loss": 0.1288,
1053
+ "step": 17000
1054
+ },
1055
+ {
1056
+ "epoch": 63.0,
1057
+ "eval_accuracy": 0.3038808038808039,
1058
+ "eval_f1_macro": 0.7105833307429198,
1059
+ "eval_f1_micro": 0.8176490288010717,
1060
+ "eval_loss": 0.12301415950059891,
1061
+ "eval_roc_auc": 0.8773957777780161,
1062
+ "eval_runtime": 748.1364,
1063
+ "eval_samples_per_second": 3.858,
1064
+ "eval_steps_per_second": 0.122,
1065
+ "learning_rate": 0.0001,
1066
+ "step": 17199
1067
+ },
1068
+ {
1069
+ "epoch": 64.0,
1070
+ "eval_accuracy": 0.3049203049203049,
1071
+ "eval_f1_macro": 0.7085844813380441,
1072
+ "eval_f1_micro": 0.8191759178412541,
1073
+ "eval_loss": 0.12328237295150757,
1074
+ "eval_roc_auc": 0.880258676287372,
1075
+ "eval_runtime": 749.8061,
1076
+ "eval_samples_per_second": 3.849,
1077
+ "eval_steps_per_second": 0.121,
1078
+ "learning_rate": 0.0001,
1079
+ "step": 17472
1080
+ },
1081
+ {
1082
+ "epoch": 64.1025641025641,
1083
+ "grad_norm": 0.2363331913948059,
1084
+ "learning_rate": 0.0001,
1085
+ "loss": 0.1291,
1086
+ "step": 17500
1087
+ },
1088
+ {
1089
+ "epoch": 65.0,
1090
+ "eval_accuracy": 0.3049203049203049,
1091
+ "eval_f1_macro": 0.7103887558295827,
1092
+ "eval_f1_micro": 0.8187567612548888,
1093
+ "eval_loss": 0.12309526652097702,
1094
+ "eval_roc_auc": 0.8798153918051592,
1095
+ "eval_runtime": 745.0937,
1096
+ "eval_samples_per_second": 3.873,
1097
+ "eval_steps_per_second": 0.122,
1098
+ "learning_rate": 0.0001,
1099
+ "step": 17745
1100
+ },
1101
+ {
1102
+ "epoch": 65.93406593406593,
1103
+ "grad_norm": 0.26966458559036255,
1104
+ "learning_rate": 0.0001,
1105
+ "loss": 0.1283,
1106
+ "step": 18000
1107
+ },
1108
+ {
1109
+ "epoch": 66.0,
1110
+ "eval_accuracy": 0.30284130284130284,
1111
+ "eval_f1_macro": 0.7061406642055487,
1112
+ "eval_f1_micro": 0.8186407442947141,
1113
+ "eval_loss": 0.12194398790597916,
1114
+ "eval_roc_auc": 0.8789458717279818,
1115
+ "eval_runtime": 744.2128,
1116
+ "eval_samples_per_second": 3.878,
1117
+ "eval_steps_per_second": 0.122,
1118
+ "learning_rate": 0.0001,
1119
+ "step": 18018
1120
+ },
1121
+ {
1122
+ "epoch": 67.0,
1123
+ "eval_accuracy": 0.3042273042273042,
1124
+ "eval_f1_macro": 0.7154558287425048,
1125
+ "eval_f1_micro": 0.8196775527077305,
1126
+ "eval_loss": 0.12292120605707169,
1127
+ "eval_roc_auc": 0.8822622625898855,
1128
+ "eval_runtime": 743.6955,
1129
+ "eval_samples_per_second": 3.881,
1130
+ "eval_steps_per_second": 0.122,
1131
+ "learning_rate": 0.0001,
1132
+ "step": 18291
1133
+ },
1134
+ {
1135
+ "epoch": 67.76556776556777,
1136
+ "grad_norm": 0.2636018991470337,
1137
+ "learning_rate": 0.0001,
1138
+ "loss": 0.1273,
1139
+ "step": 18500
1140
+ },
1141
+ {
1142
+ "epoch": 68.0,
1143
+ "eval_accuracy": 0.30803880803880807,
1144
+ "eval_f1_macro": 0.7153434473934246,
1145
+ "eval_f1_micro": 0.8209686046990085,
1146
+ "eval_loss": 0.12254418432712555,
1147
+ "eval_roc_auc": 0.8843888396454903,
1148
+ "eval_runtime": 743.6093,
1149
+ "eval_samples_per_second": 3.881,
1150
+ "eval_steps_per_second": 0.122,
1151
+ "learning_rate": 0.0001,
1152
+ "step": 18564
1153
+ },
1154
+ {
1155
+ "epoch": 69.0,
1156
+ "eval_accuracy": 0.3031878031878032,
1157
+ "eval_f1_macro": 0.7101570111652898,
1158
+ "eval_f1_micro": 0.8195983668027664,
1159
+ "eval_loss": 0.12215162813663483,
1160
+ "eval_roc_auc": 0.87988510310888,
1161
+ "eval_runtime": 744.98,
1162
+ "eval_samples_per_second": 3.874,
1163
+ "eval_steps_per_second": 0.122,
1164
+ "learning_rate": 0.0001,
1165
+ "step": 18837
1166
+ },
1167
+ {
1168
+ "epoch": 69.59706959706959,
1169
+ "grad_norm": 0.19965404272079468,
1170
+ "learning_rate": 0.0001,
1171
+ "loss": 0.1265,
1172
+ "step": 19000
1173
+ },
1174
+ {
1175
+ "epoch": 70.0,
1176
+ "eval_accuracy": 0.30838530838530837,
1177
+ "eval_f1_macro": 0.7109091736321397,
1178
+ "eval_f1_micro": 0.8184682603033231,
1179
+ "eval_loss": 0.12227334082126617,
1180
+ "eval_roc_auc": 0.8767948413903521,
1181
+ "eval_runtime": 744.4872,
1182
+ "eval_samples_per_second": 3.876,
1183
+ "eval_steps_per_second": 0.122,
1184
+ "learning_rate": 0.0001,
1185
+ "step": 19110
1186
+ },
1187
+ {
1188
+ "epoch": 71.0,
1189
+ "eval_accuracy": 0.3076923076923077,
1190
+ "eval_f1_macro": 0.7120407268503043,
1191
+ "eval_f1_micro": 0.8170385739086251,
1192
+ "eval_loss": 0.12237659096717834,
1193
+ "eval_roc_auc": 0.8737123194105673,
1194
+ "eval_runtime": 747.0787,
1195
+ "eval_samples_per_second": 3.863,
1196
+ "eval_steps_per_second": 0.122,
1197
+ "learning_rate": 0.0001,
1198
+ "step": 19383
1199
+ },
1200
+ {
1201
+ "epoch": 71.42857142857143,
1202
+ "grad_norm": 0.2734057903289795,
1203
+ "learning_rate": 0.0001,
1204
+ "loss": 0.1264,
1205
+ "step": 19500
1206
+ },
1207
+ {
1208
+ "epoch": 72.0,
1209
+ "eval_accuracy": 0.3063063063063063,
1210
+ "eval_f1_macro": 0.7203981522602361,
1211
+ "eval_f1_micro": 0.8203632727878687,
1212
+ "eval_loss": 0.1220996230840683,
1213
+ "eval_roc_auc": 0.8803336591982435,
1214
+ "eval_runtime": 742.9487,
1215
+ "eval_samples_per_second": 3.885,
1216
+ "eval_steps_per_second": 0.122,
1217
+ "learning_rate": 0.0001,
1218
+ "step": 19656
1219
+ },
1220
+ {
1221
+ "epoch": 73.0,
1222
+ "eval_accuracy": 0.3087318087318087,
1223
+ "eval_f1_macro": 0.7144193511981376,
1224
+ "eval_f1_micro": 0.8198457369189076,
1225
+ "eval_loss": 0.12169401347637177,
1226
+ "eval_roc_auc": 0.8798110725748728,
1227
+ "eval_runtime": 752.9878,
1228
+ "eval_samples_per_second": 3.833,
1229
+ "eval_steps_per_second": 0.121,
1230
+ "learning_rate": 1e-05,
1231
+ "step": 19929
1232
+ },
1233
+ {
1234
+ "epoch": 73.26007326007326,
1235
+ "grad_norm": 0.20597431063652039,
1236
+ "learning_rate": 1e-05,
1237
+ "loss": 0.1249,
1238
+ "step": 20000
1239
+ },
1240
+ {
1241
+ "epoch": 74.0,
1242
+ "eval_accuracy": 0.30665280665280664,
1243
+ "eval_f1_macro": 0.7124121424308173,
1244
+ "eval_f1_micro": 0.8190452070406484,
1245
+ "eval_loss": 0.12149834632873535,
1246
+ "eval_roc_auc": 0.8757233637628921,
1247
+ "eval_runtime": 756.5322,
1248
+ "eval_samples_per_second": 3.815,
1249
+ "eval_steps_per_second": 0.12,
1250
+ "learning_rate": 1e-05,
1251
+ "step": 20202
1252
+ },
1253
+ {
1254
+ "epoch": 75.0,
1255
+ "eval_accuracy": 0.30561330561330563,
1256
+ "eval_f1_macro": 0.7145366354361308,
1257
+ "eval_f1_micro": 0.8208643316893754,
1258
+ "eval_loss": 0.12120900303125381,
1259
+ "eval_roc_auc": 0.879641026356426,
1260
+ "eval_runtime": 752.1644,
1261
+ "eval_samples_per_second": 3.837,
1262
+ "eval_steps_per_second": 0.121,
1263
+ "learning_rate": 1e-05,
1264
+ "step": 20475
1265
+ },
1266
+ {
1267
+ "epoch": 75.0915750915751,
1268
+ "grad_norm": 0.25457698106765747,
1269
+ "learning_rate": 1e-05,
1270
+ "loss": 0.1236,
1271
+ "step": 20500
1272
+ },
1273
+ {
1274
+ "epoch": 76.0,
1275
+ "eval_accuracy": 0.30803880803880807,
1276
+ "eval_f1_macro": 0.7191205487713891,
1277
+ "eval_f1_micro": 0.8218541121766927,
1278
+ "eval_loss": 0.1215985044836998,
1279
+ "eval_roc_auc": 0.8821938390069956,
1280
+ "eval_runtime": 752.3495,
1281
+ "eval_samples_per_second": 3.836,
1282
+ "eval_steps_per_second": 0.121,
1283
+ "learning_rate": 1e-05,
1284
+ "step": 20748
1285
+ },
1286
+ {
1287
+ "epoch": 76.92307692307692,
1288
+ "grad_norm": 0.2589890658855438,
1289
+ "learning_rate": 1e-05,
1290
+ "loss": 0.1233,
1291
+ "step": 21000
1292
+ },
1293
+ {
1294
+ "epoch": 77.0,
1295
+ "eval_accuracy": 0.31323631323631324,
1296
+ "eval_f1_macro": 0.7202749659896155,
1297
+ "eval_f1_micro": 0.8236983547367989,
1298
+ "eval_loss": 0.1214083805680275,
1299
+ "eval_roc_auc": 0.8867951606378082,
1300
+ "eval_runtime": 755.0282,
1301
+ "eval_samples_per_second": 3.822,
1302
+ "eval_steps_per_second": 0.121,
1303
+ "learning_rate": 1e-05,
1304
+ "step": 21021
1305
+ },
1306
+ {
1307
+ "epoch": 78.0,
1308
+ "eval_accuracy": 0.3097713097713098,
1309
+ "eval_f1_macro": 0.7168480610158249,
1310
+ "eval_f1_micro": 0.8222591362126246,
1311
+ "eval_loss": 0.12110316008329391,
1312
+ "eval_roc_auc": 0.8823316922046746,
1313
+ "eval_runtime": 752.7354,
1314
+ "eval_samples_per_second": 3.834,
1315
+ "eval_steps_per_second": 0.121,
1316
+ "learning_rate": 1e-05,
1317
+ "step": 21294
1318
+ },
1319
+ {
1320
+ "epoch": 78.75457875457876,
1321
+ "grad_norm": 0.26676803827285767,
1322
+ "learning_rate": 1e-05,
1323
+ "loss": 0.123,
1324
+ "step": 21500
1325
+ },
1326
+ {
1327
+ "epoch": 79.0,
1328
+ "eval_accuracy": 0.30665280665280664,
1329
+ "eval_f1_macro": 0.7160500850094047,
1330
+ "eval_f1_micro": 0.8202977563430488,
1331
+ "eval_loss": 0.12149946391582489,
1332
+ "eval_roc_auc": 0.878321716124089,
1333
+ "eval_runtime": 752.3192,
1334
+ "eval_samples_per_second": 3.836,
1335
+ "eval_steps_per_second": 0.121,
1336
+ "learning_rate": 1e-05,
1337
+ "step": 21567
1338
+ },
1339
+ {
1340
+ "epoch": 80.0,
1341
+ "eval_accuracy": 0.30734580734580735,
1342
+ "eval_f1_macro": 0.7150848378423871,
1343
+ "eval_f1_micro": 0.8219257062844905,
1344
+ "eval_loss": 0.121590256690979,
1345
+ "eval_roc_auc": 0.8846639290079505,
1346
+ "eval_runtime": 747.5776,
1347
+ "eval_samples_per_second": 3.86,
1348
+ "eval_steps_per_second": 0.122,
1349
+ "learning_rate": 1e-05,
1350
+ "step": 21840
1351
+ },
1352
+ {
1353
+ "epoch": 80.58608058608058,
1354
+ "grad_norm": 0.2525629699230194,
1355
+ "learning_rate": 1e-05,
1356
+ "loss": 0.123,
1357
+ "step": 22000
1358
+ },
1359
+ {
1360
+ "epoch": 81.0,
1361
+ "eval_accuracy": 0.3115038115038115,
1362
+ "eval_f1_macro": 0.7187103786018064,
1363
+ "eval_f1_micro": 0.8216162121591194,
1364
+ "eval_loss": 0.12097962200641632,
1365
+ "eval_roc_auc": 0.8807537244642276,
1366
+ "eval_runtime": 755.3491,
1367
+ "eval_samples_per_second": 3.821,
1368
+ "eval_steps_per_second": 0.12,
1369
+ "learning_rate": 1e-05,
1370
+ "step": 22113
1371
+ },
1372
+ {
1373
+ "epoch": 82.0,
1374
+ "eval_accuracy": 0.30942480942480943,
1375
+ "eval_f1_macro": 0.7156786549052798,
1376
+ "eval_f1_micro": 0.821175978238125,
1377
+ "eval_loss": 0.12082336097955704,
1378
+ "eval_roc_auc": 0.8794272915260457,
1379
+ "eval_runtime": 753.7414,
1380
+ "eval_samples_per_second": 3.829,
1381
+ "eval_steps_per_second": 0.121,
1382
+ "learning_rate": 1e-05,
1383
+ "step": 22386
1384
+ },
1385
+ {
1386
+ "epoch": 82.41758241758242,
1387
+ "grad_norm": 0.23939679563045502,
1388
+ "learning_rate": 1e-05,
1389
+ "loss": 0.1214,
1390
+ "step": 22500
1391
+ },
1392
+ {
1393
+ "epoch": 83.0,
1394
+ "eval_accuracy": 0.30006930006930005,
1395
+ "eval_f1_macro": 0.7102312532643303,
1396
+ "eval_f1_micro": 0.8180206046275968,
1397
+ "eval_loss": 0.12147542089223862,
1398
+ "eval_roc_auc": 0.8750765523206339,
1399
+ "eval_runtime": 745.2706,
1400
+ "eval_samples_per_second": 3.872,
1401
+ "eval_steps_per_second": 0.122,
1402
+ "learning_rate": 1e-05,
1403
+ "step": 22659
1404
+ },
1405
+ {
1406
+ "epoch": 84.0,
1407
+ "eval_accuracy": 0.31185031185031187,
1408
+ "eval_f1_macro": 0.7195842513107142,
1409
+ "eval_f1_micro": 0.8215978053038491,
1410
+ "eval_loss": 0.12100570648908615,
1411
+ "eval_roc_auc": 0.8816901523695672,
1412
+ "eval_runtime": 742.6349,
1413
+ "eval_samples_per_second": 3.886,
1414
+ "eval_steps_per_second": 0.123,
1415
+ "learning_rate": 1e-05,
1416
+ "step": 22932
1417
+ },
1418
+ {
1419
+ "epoch": 84.24908424908425,
1420
+ "grad_norm": 0.30801209807395935,
1421
+ "learning_rate": 1e-05,
1422
+ "loss": 0.1234,
1423
+ "step": 23000
1424
+ },
1425
+ {
1426
+ "epoch": 85.0,
1427
+ "eval_accuracy": 0.31011781011781014,
1428
+ "eval_f1_macro": 0.7201395616901511,
1429
+ "eval_f1_micro": 0.8233587533156498,
1430
+ "eval_loss": 0.1208326444029808,
1431
+ "eval_roc_auc": 0.8835425924395763,
1432
+ "eval_runtime": 742.1618,
1433
+ "eval_samples_per_second": 3.889,
1434
+ "eval_steps_per_second": 0.123,
1435
+ "learning_rate": 1e-05,
1436
+ "step": 23205
1437
+ },
1438
+ {
1439
+ "epoch": 86.0,
1440
+ "eval_accuracy": 0.30942480942480943,
1441
+ "eval_f1_macro": 0.7215167678270465,
1442
+ "eval_f1_micro": 0.8218151540383014,
1443
+ "eval_loss": 0.1210438683629036,
1444
+ "eval_roc_auc": 0.8813373302757117,
1445
+ "eval_runtime": 754.9986,
1446
+ "eval_samples_per_second": 3.823,
1447
+ "eval_steps_per_second": 0.121,
1448
+ "learning_rate": 1e-05,
1449
+ "step": 23478
1450
+ },
1451
+ {
1452
+ "epoch": 86.08058608058609,
1453
+ "grad_norm": 0.23295313119888306,
1454
+ "learning_rate": 1e-05,
1455
+ "loss": 0.1216,
1456
+ "step": 23500
1457
+ },
1458
+ {
1459
+ "epoch": 87.0,
1460
+ "eval_accuracy": 0.3087318087318087,
1461
+ "eval_f1_macro": 0.7141558876633265,
1462
+ "eval_f1_micro": 0.8207271207689094,
1463
+ "eval_loss": 0.1212099939584732,
1464
+ "eval_roc_auc": 0.8796150036646389,
1465
+ "eval_runtime": 753.045,
1466
+ "eval_samples_per_second": 3.832,
1467
+ "eval_steps_per_second": 0.121,
1468
+ "learning_rate": 1e-05,
1469
+ "step": 23751
1470
+ },
1471
+ {
1472
+ "epoch": 87.91208791208791,
1473
+ "grad_norm": 0.21838252246379852,
1474
+ "learning_rate": 1e-05,
1475
+ "loss": 0.1219,
1476
+ "step": 24000
1477
+ },
1478
+ {
1479
+ "epoch": 88.0,
1480
+ "eval_accuracy": 0.31011781011781014,
1481
+ "eval_f1_macro": 0.7124615854591595,
1482
+ "eval_f1_micro": 0.8223957468017943,
1483
+ "eval_loss": 0.12096676975488663,
1484
+ "eval_roc_auc": 0.8823577148964619,
1485
+ "eval_runtime": 758.2188,
1486
+ "eval_samples_per_second": 3.806,
1487
+ "eval_steps_per_second": 0.12,
1488
+ "learning_rate": 1e-05,
1489
+ "step": 24024
1490
+ },
1491
+ {
1492
+ "epoch": 89.0,
1493
+ "eval_accuracy": 0.3121968121968122,
1494
+ "eval_f1_macro": 0.7249978662662346,
1495
+ "eval_f1_micro": 0.8240642149234173,
1496
+ "eval_loss": 0.12144902348518372,
1497
+ "eval_roc_auc": 0.8875640104562932,
1498
+ "eval_runtime": 760.5663,
1499
+ "eval_samples_per_second": 3.795,
1500
+ "eval_steps_per_second": 0.12,
1501
+ "learning_rate": 1.0000000000000002e-06,
1502
+ "step": 24297
1503
+ },
1504
+ {
1505
+ "epoch": 89.74358974358974,
1506
+ "grad_norm": 0.21705362200737,
1507
+ "learning_rate": 1.0000000000000002e-06,
1508
+ "loss": 0.1219,
1509
+ "step": 24500
1510
+ },
1511
+ {
1512
+ "epoch": 90.0,
1513
+ "eval_accuracy": 0.31046431046431044,
1514
+ "eval_f1_macro": 0.7198781344667567,
1515
+ "eval_f1_micro": 0.8233893154847453,
1516
+ "eval_loss": 0.12115956842899323,
1517
+ "eval_roc_auc": 0.8863713931744356,
1518
+ "eval_runtime": 763.5088,
1519
+ "eval_samples_per_second": 3.78,
1520
+ "eval_steps_per_second": 0.119,
1521
+ "learning_rate": 1.0000000000000002e-06,
1522
+ "step": 24570
1523
+ },
1524
+ {
1525
+ "epoch": 91.0,
1526
+ "eval_accuracy": 0.3097713097713098,
1527
+ "eval_f1_macro": 0.7159843095789674,
1528
+ "eval_f1_micro": 0.8212459126351974,
1529
+ "eval_loss": 0.1208055168390274,
1530
+ "eval_roc_auc": 0.8789555162204534,
1531
+ "eval_runtime": 757.8368,
1532
+ "eval_samples_per_second": 3.808,
1533
+ "eval_steps_per_second": 0.12,
1534
+ "learning_rate": 1.0000000000000002e-06,
1535
+ "step": 24843
1536
+ },
1537
+ {
1538
+ "epoch": 91.57509157509158,
1539
+ "grad_norm": 0.23301896452903748,
1540
+ "learning_rate": 1.0000000000000002e-06,
1541
+ "loss": 0.1213,
1542
+ "step": 25000
1543
+ },
1544
+ {
1545
+ "epoch": 92.0,
1546
+ "eval_accuracy": 0.30734580734580735,
1547
+ "eval_f1_macro": 0.7144036362020703,
1548
+ "eval_f1_micro": 0.8223893065998329,
1549
+ "eval_loss": 0.12069901078939438,
1550
+ "eval_roc_auc": 0.8806577087797879,
1551
+ "eval_runtime": 763.0077,
1552
+ "eval_samples_per_second": 3.782,
1553
+ "eval_steps_per_second": 0.119,
1554
+ "learning_rate": 1.0000000000000002e-06,
1555
+ "step": 25116
1556
+ },
1557
+ {
1558
+ "epoch": 93.0,
1559
+ "eval_accuracy": 0.30803880803880807,
1560
+ "eval_f1_macro": 0.7189178649032102,
1561
+ "eval_f1_micro": 0.8226574468966088,
1562
+ "eval_loss": 0.12093978375196457,
1563
+ "eval_roc_auc": 0.8834391187053254,
1564
+ "eval_runtime": 763.7654,
1565
+ "eval_samples_per_second": 3.779,
1566
+ "eval_steps_per_second": 0.119,
1567
+ "learning_rate": 1.0000000000000002e-06,
1568
+ "step": 25389
1569
+ },
1570
+ {
1571
+ "epoch": 93.4065934065934,
1572
+ "grad_norm": 0.2630571126937866,
1573
+ "learning_rate": 1.0000000000000002e-06,
1574
+ "loss": 0.122,
1575
+ "step": 25500
1576
+ },
1577
+ {
1578
+ "epoch": 94.0,
1579
+ "eval_accuracy": 0.3097713097713098,
1580
+ "eval_f1_macro": 0.7187657914933285,
1581
+ "eval_f1_micro": 0.8223438666334908,
1582
+ "eval_loss": 0.12092197686433792,
1583
+ "eval_roc_auc": 0.8828028504773688,
1584
+ "eval_runtime": 758.2573,
1585
+ "eval_samples_per_second": 3.806,
1586
+ "eval_steps_per_second": 0.12,
1587
+ "learning_rate": 1.0000000000000002e-06,
1588
+ "step": 25662
1589
+ },
1590
+ {
1591
+ "epoch": 95.0,
1592
+ "eval_accuracy": 0.30942480942480943,
1593
+ "eval_f1_macro": 0.7127077698746517,
1594
+ "eval_f1_micro": 0.8221934621968021,
1595
+ "eval_loss": 0.1206900030374527,
1596
+ "eval_roc_auc": 0.8807116052620565,
1597
+ "eval_runtime": 755.4845,
1598
+ "eval_samples_per_second": 3.82,
1599
+ "eval_steps_per_second": 0.12,
1600
+ "learning_rate": 1.0000000000000002e-06,
1601
+ "step": 25935
1602
+ },
1603
+ {
1604
+ "epoch": 95.23809523809524,
1605
+ "grad_norm": 0.32719686627388,
1606
+ "learning_rate": 1.0000000000000002e-06,
1607
+ "loss": 0.1209,
1608
+ "step": 26000
1609
+ },
1610
+ {
1611
+ "epoch": 96.0,
1612
+ "eval_accuracy": 0.30665280665280664,
1613
+ "eval_f1_macro": 0.7160309422692305,
1614
+ "eval_f1_micro": 0.8218438538205979,
1615
+ "eval_loss": 0.12142115086317062,
1616
+ "eval_roc_auc": 0.882100908487046,
1617
+ "eval_runtime": 764.2068,
1618
+ "eval_samples_per_second": 3.776,
1619
+ "eval_steps_per_second": 0.119,
1620
+ "learning_rate": 1.0000000000000002e-06,
1621
+ "step": 26208
1622
+ },
1623
+ {
1624
+ "epoch": 97.0,
1625
+ "eval_accuracy": 0.30942480942480943,
1626
+ "eval_f1_macro": 0.71586766610014,
1627
+ "eval_f1_micro": 0.8208711661575798,
1628
+ "eval_loss": 0.12264719605445862,
1629
+ "eval_roc_auc": 0.879308955347207,
1630
+ "eval_runtime": 783.117,
1631
+ "eval_samples_per_second": 3.685,
1632
+ "eval_steps_per_second": 0.116,
1633
+ "learning_rate": 1.0000000000000002e-06,
1634
+ "step": 26481
1635
+ },
1636
+ {
1637
+ "epoch": 97.06959706959707,
1638
+ "grad_norm": 0.27319103479385376,
1639
+ "learning_rate": 1.0000000000000002e-06,
1640
+ "loss": 0.122,
1641
+ "step": 26500
1642
+ },
1643
+ {
1644
+ "epoch": 98.0,
1645
+ "eval_accuracy": 0.31185031185031187,
1646
+ "eval_f1_macro": 0.7190138873820752,
1647
+ "eval_f1_micro": 0.8224561403508771,
1648
+ "eval_loss": 0.12095578759908676,
1649
+ "eval_roc_auc": 0.8842500877259815,
1650
+ "eval_runtime": 761.8672,
1651
+ "eval_samples_per_second": 3.788,
1652
+ "eval_steps_per_second": 0.119,
1653
+ "learning_rate": 1.0000000000000002e-06,
1654
+ "step": 26754
1655
+ },
1656
+ {
1657
+ "epoch": 98.9010989010989,
1658
+ "grad_norm": 0.314969539642334,
1659
+ "learning_rate": 1.0000000000000002e-07,
1660
+ "loss": 0.1218,
1661
+ "step": 27000
1662
+ },
1663
+ {
1664
+ "epoch": 99.0,
1665
+ "eval_accuracy": 0.3097713097713098,
1666
+ "eval_f1_macro": 0.7177436878101541,
1667
+ "eval_f1_micro": 0.821403230518803,
1668
+ "eval_loss": 0.12075632065534592,
1669
+ "eval_roc_auc": 0.8803494740196957,
1670
+ "eval_runtime": 749.7836,
1671
+ "eval_samples_per_second": 3.849,
1672
+ "eval_steps_per_second": 0.121,
1673
+ "learning_rate": 1.0000000000000002e-07,
1674
+ "step": 27027
1675
+ },
1676
+ {
1677
+ "epoch": 100.0,
1678
+ "eval_accuracy": 0.3108108108108108,
1679
+ "eval_f1_macro": 0.7191112023643382,
1680
+ "eval_f1_micro": 0.8218776194467728,
1681
+ "eval_loss": 0.12078335881233215,
1682
+ "eval_roc_auc": 0.8793780496180298,
1683
+ "eval_runtime": 751.4627,
1684
+ "eval_samples_per_second": 3.841,
1685
+ "eval_steps_per_second": 0.121,
1686
+ "learning_rate": 1.0000000000000002e-07,
1687
+ "step": 27300
1688
+ },
1689
+ {
1690
+ "epoch": 100.73260073260073,
1691
+ "grad_norm": 0.3180501163005829,
1692
+ "learning_rate": 1.0000000000000002e-07,
1693
+ "loss": 0.1222,
1694
+ "step": 27500
1695
+ },
1696
+ {
1697
+ "epoch": 101.0,
1698
+ "eval_accuracy": 0.3097713097713098,
1699
+ "eval_f1_macro": 0.7199208624613478,
1700
+ "eval_f1_micro": 0.8230599775551769,
1701
+ "eval_loss": 0.12071150541305542,
1702
+ "eval_roc_auc": 0.8825144680800833,
1703
+ "eval_runtime": 753.7405,
1704
+ "eval_samples_per_second": 3.829,
1705
+ "eval_steps_per_second": 0.121,
1706
+ "learning_rate": 1.0000000000000002e-07,
1707
+ "step": 27573
1708
+ },
1709
+ {
1710
+ "epoch": 102.0,
1711
+ "eval_accuracy": 0.31011781011781014,
1712
+ "eval_f1_macro": 0.7181176324357539,
1713
+ "eval_f1_micro": 0.821560093739538,
1714
+ "eval_loss": 0.12102664262056351,
1715
+ "eval_roc_auc": 0.8796515695707274,
1716
+ "eval_runtime": 750.0067,
1717
+ "eval_samples_per_second": 3.848,
1718
+ "eval_steps_per_second": 0.121,
1719
+ "learning_rate": 1.0000000000000002e-07,
1720
+ "step": 27846
1721
+ },
1722
+ {
1723
+ "epoch": 102.56410256410257,
1724
+ "grad_norm": 0.257368803024292,
1725
+ "learning_rate": 1.0000000000000002e-07,
1726
+ "loss": 0.1212,
1727
+ "step": 28000
1728
+ },
1729
+ {
1730
+ "epoch": 103.0,
1731
+ "eval_accuracy": 0.31115731115731116,
1732
+ "eval_f1_macro": 0.7156251632807489,
1733
+ "eval_f1_micro": 0.8218559116391932,
1734
+ "eval_loss": 0.12072332948446274,
1735
+ "eval_roc_auc": 0.879889475994201,
1736
+ "eval_runtime": 747.7283,
1737
+ "eval_samples_per_second": 3.86,
1738
+ "eval_steps_per_second": 0.122,
1739
+ "learning_rate": 1.0000000000000002e-07,
1740
+ "step": 28119
1741
+ },
1742
+ {
1743
+ "epoch": 104.0,
1744
+ "eval_accuracy": 0.3090783090783091,
1745
+ "eval_f1_macro": 0.7151217785983346,
1746
+ "eval_f1_micro": 0.8214226220223222,
1747
+ "eval_loss": 0.12122868001461029,
1748
+ "eval_roc_auc": 0.8810201217110805,
1749
+ "eval_runtime": 751.9776,
1750
+ "eval_samples_per_second": 3.838,
1751
+ "eval_steps_per_second": 0.121,
1752
+ "learning_rate": 1.0000000000000002e-07,
1753
+ "step": 28392
1754
+ },
1755
+ {
1756
+ "epoch": 104.3956043956044,
1757
+ "grad_norm": 0.2758227586746216,
1758
+ "learning_rate": 1.0000000000000004e-08,
1759
+ "loss": 0.1204,
1760
+ "step": 28500
1761
+ },
1762
+ {
1763
+ "epoch": 105.0,
1764
+ "eval_accuracy": 0.30838530838530837,
1765
+ "eval_f1_macro": 0.7175066761763569,
1766
+ "eval_f1_micro": 0.8216449497883642,
1767
+ "eval_loss": 0.12081456929445267,
1768
+ "eval_roc_auc": 0.882214590091632,
1769
+ "eval_runtime": 750.7114,
1770
+ "eval_samples_per_second": 3.844,
1771
+ "eval_steps_per_second": 0.121,
1772
+ "learning_rate": 1.0000000000000004e-08,
1773
+ "step": 28665
1774
+ },
1775
+ {
1776
+ "epoch": 105.0,
1777
+ "learning_rate": 1.0000000000000004e-08,
1778
+ "step": 28665,
1779
+ "total_flos": 5.049640374682393e+21,
1780
+ "train_loss": 0.023157235795491324,
1781
+ "train_runtime": 62002.1626,
1782
+ "train_samples_per_second": 21.086,
1783
+ "train_steps_per_second": 0.66
1784
+ }
1785
+ ],
1786
+ "logging_steps": 500,
1787
+ "max_steps": 40950,
1788
+ "num_input_tokens_seen": 0,
1789
+ "num_train_epochs": 150,
1790
+ "save_steps": 500,
1791
+ "stateful_callbacks": {
1792
+ "EarlyStoppingCallback": {
1793
+ "args": {
1794
+ "early_stopping_patience": 10,
1795
+ "early_stopping_threshold": 0.0
1796
+ },
1797
+ "attributes": {
1798
+ "early_stopping_patience_counter": 0
1799
+ }
1800
+ },
1801
+ "TrainerControl": {
1802
+ "args": {
1803
+ "should_epoch_stop": false,
1804
+ "should_evaluate": false,
1805
+ "should_log": false,
1806
+ "should_save": true,
1807
+ "should_training_stop": true
1808
+ },
1809
+ "attributes": {}
1810
+ }
1811
+ },
1812
+ "total_flos": 5.049640374682393e+21,
1813
+ "train_batch_size": 32,
1814
+ "trial_name": null,
1815
+ "trial_params": null
1816
+ }