lombardata commited on
Commit
3ea82a1
1 Parent(s): eac6183

Evaluation on the test set completed on 2024_11_14.

Browse files
README.md CHANGED
@@ -1,172 +1,124 @@
1
-
2
  ---
3
- language:
4
- - eng
5
- license: cc0-1.0
6
  tags:
7
- - multilabel-image-classification
8
- - multilabel
9
  - generated_from_trainer
10
- base_model: drone-DinoVdeau-from-binary-large-2024_11_14-batch-size16_freeze_probs
11
  model-index:
12
  - name: drone-DinoVdeau-from-binary-large-2024_11_14-batch-size16_freeze_probs
13
  results: []
14
  ---
15
 
16
- drone-DinoVdeau-from-binary is a fine-tuned version of [drone-DinoVdeau-from-binary-large-2024_11_14-batch-size16_freeze_probs](https://huggingface.co/drone-DinoVdeau-from-binary-large-2024_11_14-batch-size16_freeze_probs). It achieves the following results on the test set:
17
-
18
-
19
- - Loss: 0.4693
20
- - F1 Micro: 0.0000
21
- - F1 Macro: 0.0000
22
- - Accuracy: 0.0000
23
- - RMSE: 0.1576
24
- - MAE: 0.1172
25
- - KL Divergence: 0.4185
26
-
27
- ---
28
 
29
- # Model description
30
- drone-DinoVdeau-from-binary is a model built on top of drone-DinoVdeau-from-binary-large-2024_11_14-batch-size16_freeze_probs model for underwater multilabel image classification.The classification head is a combination of linear, ReLU, batch normalization, and dropout layers.
31
 
32
- The source code for training the model can be found in this [Git repository](https://github.com/SeatizenDOI/DinoVdeau).
 
 
 
 
 
 
 
33
 
34
- - **Developed by:** [lombardata](https://huggingface.co/lombardata), credits to [César Leblanc](https://huggingface.co/CesarLeblanc) and [Victor Illien](https://huggingface.co/groderg)
35
 
36
- ---
37
 
38
- # Intended uses & limitations
39
- You can use the raw model for classify diverse marine species, encompassing coral morphotypes classes taken from the Global Coral Reef Monitoring Network (GCRMN), habitats classes and seagrass species.
40
 
41
- ---
42
 
43
- # Training and evaluation data
44
- Details on the estimated number of images for each class are given in the following table:
45
- | Class | train | test | val | Total |
46
- |:------------------------|--------:|-------:|------:|--------:|
47
- | Acropore_branched | 1220 | 363 | 362 | 1945 |
48
- | Acropore_digitised | 586 | 195 | 189 | 970 |
49
- | Acropore_tabular | 308 | 133 | 119 | 560 |
50
- | Algae | 4777 | 1372 | 1384 | 7533 |
51
- | Dead_coral | 2513 | 671 | 693 | 3877 |
52
- | Millepore | 136 | 55 | 59 | 250 |
53
- | No_acropore_encrusting | 252 | 88 | 93 | 433 |
54
- | No_acropore_massive | 2158 | 725 | 726 | 3609 |
55
- | No_acropore_sub_massive | 2036 | 582 | 612 | 3230 |
56
- | Rock | 5976 | 1941 | 1928 | 9845 |
57
- | Rubble | 4851 | 1486 | 1474 | 7811 |
58
- | Sand | 6155 | 2019 | 1990 | 10164 |
59
 
60
- ---
61
 
62
- # Training procedure
63
 
64
- ## Training hyperparameters
65
 
66
  The following hyperparameters were used during training:
67
-
68
- - **Number of Epochs**: 62.0
69
- - **Learning Rate**: 0.001
70
- - **Train Batch Size**: 16
71
- - **Eval Batch Size**: 16
72
- - **Optimizer**: Adam
73
- - **LR Scheduler Type**: ReduceLROnPlateau with a patience of 5 epochs and a factor of 0.1
74
- - **Freeze Encoder**: Yes
75
- - **Data Augmentation**: Yes
76
-
77
-
78
- ## Data Augmentation
79
- Data were augmented using the following transformations :
80
-
81
- Train Transforms
82
- - **PreProcess**: No additional parameters
83
- - **Resize**: probability=1.00
84
- - **RandomHorizontalFlip**: probability=0.25
85
- - **RandomVerticalFlip**: probability=0.25
86
- - **ColorJiggle**: probability=0.25
87
- - **RandomPerspective**: probability=0.25
88
- - **Normalize**: probability=1.00
89
-
90
- Val Transforms
91
- - **PreProcess**: No additional parameters
92
- - **Resize**: probability=1.00
93
- - **Normalize**: probability=1.00
94
-
95
-
96
-
97
- ## Training results
98
- Epoch | Validation Loss | MAE | RMSE | KL div | Learning Rate
99
- --- | --- | --- | --- | --- | ---
100
- 1 | 0.4821413457393646 | 0.1308 | 0.1731 | 0.4219 | 0.001
101
- 2 | 0.4784533977508545 | 0.1263 | 0.1710 | 0.6148 | 0.001
102
- 3 | 0.47776785492897034 | 0.1273 | 0.1699 | 0.4880 | 0.001
103
- 4 | 0.4793245792388916 | 0.1290 | 0.1710 | 0.3418 | 0.001
104
- 5 | 0.47521594166755676 | 0.1280 | 0.1674 | 0.3456 | 0.001
105
- 6 | 0.478865385055542 | 0.1254 | 0.1707 | 0.6403 | 0.001
106
- 7 | 0.4779475927352905 | 0.1288 | 0.1709 | 0.5492 | 0.001
107
- 8 | 0.4756968021392822 | 0.1270 | 0.1678 | 0.3383 | 0.001
108
- 9 | 0.4731782376766205 | 0.1231 | 0.1657 | 0.5418 | 0.001
109
- 10 | 0.4799855649471283 | 0.1321 | 0.1723 | 0.1547 | 0.001
110
- 11 | 0.4731641411781311 | 0.1256 | 0.1656 | 0.3437 | 0.001
111
- 12 | 0.47767141461372375 | 0.1293 | 0.1701 | 0.2947 | 0.001
112
- 13 | 0.48009705543518066 | 0.1248 | 0.1677 | 0.6136 | 0.001
113
- 14 | 0.4954195022583008 | 0.1253 | 0.1669 | inf | 0.001
114
- 15 | 0.4812238812446594 | 0.1254 | 0.1662 | inf | 0.001
115
- 16 | 0.4858487546443939 | 0.1243 | 0.1656 | inf | 0.0001
116
- 17 | 0.47084349393844604 | 0.1223 | 0.1628 | 0.4165 | 0.0001
117
- 18 | 0.4707622528076172 | 0.1216 | 0.1626 | 0.4066 | 0.0001
118
- 19 | 0.47095733880996704 | 0.1227 | 0.1632 | 0.3185 | 0.0001
119
- 20 | 0.4696938395500183 | 0.1205 | 0.1620 | 0.4651 | 0.0001
120
- 21 | 0.46922874450683594 | 0.1216 | 0.1614 | 0.3773 | 0.0001
121
- 22 | 0.4685634672641754 | 0.1203 | 0.1609 | 0.4611 | 0.0001
122
- 23 | 0.47018975019454956 | 0.1226 | 0.1621 | 0.2499 | 0.0001
123
- 24 | 0.4705464243888855 | 0.1213 | 0.1628 | 0.3702 | 0.0001
124
- 25 | 0.4678299129009247 | 0.1188 | 0.1601 | 0.5133 | 0.0001
125
- 26 | 0.46802961826324463 | 0.1179 | 0.1604 | 0.5665 | 0.0001
126
- 27 | 0.4680938124656677 | 0.1200 | 0.1604 | 0.4242 | 0.0001
127
- 28 | 0.4693257212638855 | 0.1215 | 0.1616 | 0.2968 | 0.0001
128
- 29 | 0.46847742795944214 | 0.1197 | 0.1607 | 0.3925 | 0.0001
129
- 30 | 0.46944141387939453 | 0.1221 | 0.1614 | 0.2495 | 0.0001
130
- 31 | 0.4678958058357239 | 0.1185 | 0.1601 | 0.4510 | 0.0001
131
- 32 | 0.46778997778892517 | 0.1193 | 0.1601 | 0.3886 | 1e-05
132
- 33 | 0.4686955511569977 | 0.1202 | 0.1606 | 0.3132 | 1e-05
133
- 34 | 0.46784329414367676 | 0.1195 | 0.1601 | 0.3958 | 1e-05
134
- 35 | 0.4671097695827484 | 0.1180 | 0.1595 | 0.4579 | 1e-05
135
- 36 | 0.46735426783561707 | 0.1184 | 0.1595 | 0.4391 | 1e-05
136
- 37 | 0.468018501996994 | 0.1191 | 0.1600 | 0.3633 | 1e-05
137
- 38 | 0.46701580286026 | 0.1186 | 0.1592 | 0.4303 | 1e-05
138
- 39 | 0.4673251509666443 | 0.1187 | 0.1596 | 0.4562 | 1e-05
139
- 40 | 0.4673212468624115 | 0.1189 | 0.1594 | 0.4065 | 1e-05
140
- 41 | 0.4677547216415405 | 0.1206 | 0.1599 | 0.3336 | 1e-05
141
- 42 | 0.4671882390975952 | 0.1178 | 0.1597 | 0.5312 | 1e-05
142
- 43 | 0.46716412901878357 | 0.1185 | 0.1592 | 0.3924 | 1e-05
143
- 44 | 0.4678168296813965 | 0.1194 | 0.1602 | 0.4259 | 1e-05
144
- 45 | 0.46699702739715576 | 0.1172 | 0.1594 | 0.5214 | 1.0000000000000002e-06
145
- 46 | 0.46712958812713623 | 0.1188 | 0.1594 | 0.4175 | 1.0000000000000002e-06
146
- 47 | 0.4666382074356079 | 0.1188 | 0.1589 | 0.4446 | 1.0000000000000002e-06
147
- 48 | 0.46714723110198975 | 0.1180 | 0.1597 | 0.5755 | 1.0000000000000002e-06
148
- 49 | 0.46758702397346497 | 0.1192 | 0.1600 | 0.4304 | 1.0000000000000002e-06
149
- 50 | 0.46752068400382996 | 0.1204 | 0.1595 | 0.3337 | 1.0000000000000002e-06
150
- 51 | 0.46691644191741943 | 0.1181 | 0.1591 | 0.3955 | 1.0000000000000002e-06
151
- 52 | 0.466439425945282 | 0.1175 | 0.1588 | 0.4761 | 1.0000000000000002e-06
152
- 53 | 0.4667709469795227 | 0.1189 | 0.1590 | 0.4327 | 1.0000000000000002e-06
153
- 54 | 0.46701404452323914 | 0.1187 | 0.1592 | 0.3725 | 1.0000000000000002e-06
154
- 55 | 0.467383474111557 | 0.1199 | 0.1595 | 0.3841 | 1.0000000000000002e-06
155
- 56 | 0.46739572286605835 | 0.1190 | 0.1596 | 0.3822 | 1.0000000000000002e-06
156
- 57 | 0.46702033281326294 | 0.1186 | 0.1593 | 0.4675 | 1.0000000000000002e-06
157
- 58 | 0.46735846996307373 | 0.1189 | 0.1596 | 0.3738 | 1.0000000000000002e-06
158
- 59 | 0.46666717529296875 | 0.1185 | 0.1589 | 0.4204 | 1.0000000000000002e-07
159
- 60 | 0.46685320138931274 | 0.1178 | 0.1592 | 0.4532 | 1.0000000000000002e-07
160
- 61 | 0.46734780073165894 | 0.1189 | 0.1596 | 0.4032 | 1.0000000000000002e-07
161
- 62 | 0.4673011302947998 | 0.1189 | 0.1595 | 0.3407 | 1.0000000000000002e-07
162
-
163
-
164
- ---
165
-
166
- # Framework Versions
167
-
168
- - **Transformers**: 4.41.0
169
- - **Pytorch**: 2.5.0+cu124
170
- - **Datasets**: 3.0.2
171
- - **Tokenizers**: 0.19.1
172
-
 
 
1
  ---
2
+ license: apache-2.0
3
+ base_model: facebook/dinov2-large
 
4
  tags:
 
 
5
  - generated_from_trainer
 
6
  model-index:
7
  - name: drone-DinoVdeau-from-binary-large-2024_11_14-batch-size16_freeze_probs
8
  results: []
9
  ---
10
 
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
13
 
14
+ # drone-DinoVdeau-from-binary-large-2024_11_14-batch-size16_freeze_probs
 
15
 
16
+ This model is a fine-tuned version of [facebook/dinov2-large](https://huggingface.co/facebook/dinov2-large) on the None dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 0.4061
19
+ - Rmse: 0.2019
20
+ - Mae: 0.1446
21
+ - Kl Divergence: 0.9802
22
+ - Explained Variance: 0.3860
23
+ - Learning Rate: 0.0000
24
 
25
+ ## Model description
26
 
27
+ More information needed
28
 
29
+ ## Intended uses & limitations
 
30
 
31
+ More information needed
32
 
33
+ ## Training and evaluation data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
+ More information needed
36
 
37
+ ## Training procedure
38
 
39
+ ### Training hyperparameters
40
 
41
  The following hyperparameters were used during training:
42
+ - learning_rate: 0.001
43
+ - train_batch_size: 16
44
+ - eval_batch_size: 16
45
+ - seed: 42
46
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
+ - lr_scheduler_type: linear
48
+ - num_epochs: 150
49
+ - mixed_precision_training: Native AMP
50
+
51
+ ### Training results
52
+
53
+ | Training Loss | Epoch | Step | Validation Loss | Rmse | Mae | Kl Divergence | Explained Variance | Rate |
54
+ |:-------------:|:-----:|:-----:|:---------------:|:------:|:------:|:-------------:|:------------------:|:------:|
55
+ | No log | 1.0 | 438 | 0.4306 | 0.2210 | 0.1621 | 1.0069 | 0.2882 | 0.001 |
56
+ | 0.4808 | 2.0 | 876 | 0.4246 | 0.2179 | 0.1547 | 1.3119 | 0.3118 | 0.001 |
57
+ | 0.421 | 3.0 | 1314 | 0.4223 | 0.2158 | 0.1554 | 1.0982 | 0.3192 | 0.001 |
58
+ | 0.4151 | 4.0 | 1752 | 0.4191 | 0.2142 | 0.1552 | 1.0414 | 0.3351 | 0.001 |
59
+ | 0.4114 | 5.0 | 2190 | 0.4171 | 0.2123 | 0.1541 | 1.0698 | 0.3384 | 0.001 |
60
+ | 0.4089 | 6.0 | 2628 | 0.4209 | 0.2140 | 0.1520 | 1.1959 | 0.3311 | 0.001 |
61
+ | 0.4091 | 7.0 | 3066 | 0.4166 | 0.2126 | 0.1530 | 1.1709 | 0.3382 | 0.001 |
62
+ | 0.4071 | 8.0 | 3504 | 0.4195 | 0.2143 | 0.1556 | 0.9712 | 0.3346 | 0.001 |
63
+ | 0.4071 | 9.0 | 3942 | 0.4167 | 0.2121 | 0.1524 | 1.1432 | 0.3415 | 0.001 |
64
+ | 0.4062 | 10.0 | 4380 | 0.4186 | 0.2139 | 0.1535 | 0.9121 | 0.3420 | 0.001 |
65
+ | 0.4052 | 11.0 | 4818 | 0.4156 | 0.2114 | 0.1536 | 0.9950 | 0.3442 | 0.001 |
66
+ | 0.406 | 12.0 | 5256 | 0.4188 | 0.2139 | 0.1555 | 1.0106 | 0.3390 | 0.001 |
67
+ | 0.4058 | 13.0 | 5694 | 0.4163 | 0.2121 | 0.1553 | 1.1482 | 0.3425 | 0.001 |
68
+ | 0.4056 | 14.0 | 6132 | 0.4193 | 0.2138 | 0.1546 | 1.2111 | 0.3286 | 0.001 |
69
+ | 0.4033 | 15.0 | 6570 | 0.4162 | 0.2121 | 0.1542 | 1.2043 | 0.3402 | 0.001 |
70
+ | 0.4057 | 16.0 | 7008 | 0.4139 | 0.2102 | 0.1528 | 1.0828 | 0.3500 | 0.001 |
71
+ | 0.4057 | 17.0 | 7446 | 0.4171 | 0.2118 | 0.1564 | 1.0006 | 0.3430 | 0.001 |
72
+ | 0.405 | 18.0 | 7884 | 0.4146 | 0.2107 | 0.1507 | 1.0514 | 0.3499 | 0.001 |
73
+ | 0.4035 | 19.0 | 8322 | 0.4186 | 0.2114 | 0.1532 | 0.9575 | 0.3468 | 0.001 |
74
+ | 0.4031 | 20.0 | 8760 | 0.4143 | 0.2108 | 0.1513 | 1.1648 | 0.3487 | 0.001 |
75
+ | 0.4048 | 21.0 | 9198 | 0.4195 | 0.2123 | 0.1533 | 1.2950 | 0.3385 | 0.001 |
76
+ | 0.4055 | 22.0 | 9636 | 0.4340 | 0.2110 | 0.1524 | inf | 0.3463 | 0.001 |
77
+ | 0.4022 | 23.0 | 10074 | 0.4327 | 0.2085 | 0.1517 | nan | 0.3621 | 0.0001 |
78
+ | 0.3978 | 24.0 | 10512 | 0.4385 | 0.2092 | 0.1493 | nan | 0.3583 | 0.0001 |
79
+ | 0.3978 | 25.0 | 10950 | 0.4272 | 0.2074 | 0.1490 | inf | 0.3649 | 0.0001 |
80
+ | 0.3988 | 26.0 | 11388 | 0.4105 | 0.2075 | 0.1480 | 1.1903 | 0.3644 | 0.0001 |
81
+ | 0.3958 | 27.0 | 11826 | 0.4096 | 0.2067 | 0.1494 | 0.9915 | 0.3688 | 0.0001 |
82
+ | 0.3965 | 28.0 | 12264 | 0.4104 | 0.2075 | 0.1493 | 0.9669 | 0.3681 | 0.0001 |
83
+ | 0.396 | 29.0 | 12702 | 0.4097 | 0.2069 | 0.1469 | 1.0433 | 0.3696 | 0.0001 |
84
+ | 0.3936 | 30.0 | 13140 | 0.4094 | 0.2065 | 0.1490 | 0.9082 | 0.3731 | 0.0001 |
85
+ | 0.3944 | 31.0 | 13578 | 0.4091 | 0.2065 | 0.1470 | 1.0120 | 0.3705 | 0.0001 |
86
+ | 0.3941 | 32.0 | 14016 | 0.4084 | 0.2060 | 0.1483 | 0.9708 | 0.3742 | 0.0001 |
87
+ | 0.3941 | 33.0 | 14454 | 0.4082 | 0.2057 | 0.1474 | 0.9317 | 0.3755 | 0.0001 |
88
+ | 0.3933 | 34.0 | 14892 | 0.4085 | 0.2061 | 0.1481 | 0.9619 | 0.3747 | 0.0001 |
89
+ | 0.3926 | 35.0 | 15330 | 0.4073 | 0.2054 | 0.1466 | 1.0523 | 0.3758 | 0.0001 |
90
+ | 0.3936 | 36.0 | 15768 | 0.4074 | 0.2052 | 0.1460 | 1.0622 | 0.3771 | 0.0001 |
91
+ | 0.3935 | 37.0 | 16206 | 0.4066 | 0.2047 | 0.1456 | 1.0201 | 0.3802 | 0.0001 |
92
+ | 0.3927 | 38.0 | 16644 | 0.4064 | 0.2045 | 0.1459 | 1.0557 | 0.3800 | 0.0001 |
93
+ | 0.392 | 39.0 | 17082 | 0.4078 | 0.2056 | 0.1469 | 1.0055 | 0.3771 | 0.0001 |
94
+ | 0.3915 | 40.0 | 17520 | 0.4068 | 0.2049 | 0.1464 | 0.9849 | 0.3805 | 0.0001 |
95
+ | 0.3915 | 41.0 | 17958 | 0.4089 | 0.2063 | 0.1489 | 0.8999 | 0.3778 | 0.0001 |
96
+ | 0.3907 | 42.0 | 18396 | 0.4069 | 0.2049 | 0.1463 | 1.0617 | 0.3797 | 0.0001 |
97
+ | 0.3919 | 43.0 | 18834 | 0.4058 | 0.2041 | 0.1450 | 1.0520 | 0.3830 | 0.0001 |
98
+ | 0.3902 | 44.0 | 19272 | 0.4071 | 0.2050 | 0.1475 | 1.0054 | 0.3809 | 0.0001 |
99
+ | 0.3896 | 45.0 | 19710 | 0.4067 | 0.2047 | 0.1440 | 1.1386 | 0.3813 | 0.0001 |
100
+ | 0.3925 | 46.0 | 20148 | 0.4067 | 0.2047 | 0.1457 | 1.0253 | 0.3831 | 0.0001 |
101
+ | 0.3896 | 47.0 | 20586 | 0.4062 | 0.2043 | 0.1473 | 1.0430 | 0.3834 | 0.0001 |
102
+ | 0.3902 | 48.0 | 21024 | 0.4065 | 0.2048 | 0.1457 | 1.1041 | 0.3812 | 0.0001 |
103
+ | 0.3902 | 49.0 | 21462 | 0.4071 | 0.2052 | 0.1463 | 1.0702 | 0.3798 | 0.0001 |
104
+ | 0.3897 | 50.0 | 21900 | 0.4064 | 0.2042 | 0.1479 | 0.8917 | 0.3857 | 1e-05 |
105
+ | 0.3875 | 51.0 | 22338 | 0.4058 | 0.2041 | 0.1437 | 0.9960 | 0.3845 | 1e-05 |
106
+ | 0.3874 | 52.0 | 22776 | 0.4053 | 0.2037 | 0.1446 | 1.0567 | 0.3851 | 1e-05 |
107
+ | 0.3899 | 53.0 | 23214 | 0.4056 | 0.2039 | 0.1462 | 1.0205 | 0.3859 | 1e-05 |
108
+ | 0.3892 | 54.0 | 23652 | 0.4059 | 0.2041 | 0.1441 | 0.9905 | 0.3854 | 1e-05 |
109
+ | 0.3892 | 55.0 | 24090 | 0.4061 | 0.2041 | 0.1471 | 0.9379 | 0.3856 | 1e-05 |
110
+ | 0.3869 | 56.0 | 24528 | 0.4059 | 0.2041 | 0.1454 | 0.9696 | 0.3854 | 1e-05 |
111
+ | 0.3869 | 57.0 | 24966 | 0.4058 | 0.2041 | 0.1460 | 1.0591 | 0.3842 | 1e-05 |
112
+ | 0.3874 | 58.0 | 25404 | 0.4063 | 0.2043 | 0.1460 | 0.9276 | 0.3860 | 1e-05 |
113
+ | 0.3887 | 59.0 | 25842 | 0.4056 | 0.2038 | 0.1453 | 0.9794 | 0.3868 | 0.0000 |
114
+ | 0.3882 | 60.0 | 26280 | 0.4057 | 0.2040 | 0.1446 | 1.0349 | 0.3851 | 0.0000 |
115
+ | 0.389 | 61.0 | 26718 | 0.4058 | 0.2041 | 0.1449 | 0.9860 | 0.3857 | 0.0000 |
116
+ | 0.3882 | 62.0 | 27156 | 0.4054 | 0.2037 | 0.1446 | 0.9528 | 0.3865 | 0.0000 |
117
+
118
+
119
+ ### Framework versions
120
+
121
+ - Transformers 4.41.0
122
+ - Pytorch 2.5.0+cu124
123
+ - Datasets 3.0.2
124
+ - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
all_results.json CHANGED
@@ -1,17 +1,17 @@
1
  {
2
  "epoch": 62.0,
3
- "eval_explained_variance": 0.44826042652130127,
4
- "eval_kl_divergence": 0.41849657893180847,
5
- "eval_loss": 0.4693068563938141,
6
- "eval_mae": 0.11716800183057785,
7
- "eval_rmse": 0.15763415396213531,
8
- "eval_runtime": 70.8809,
9
- "eval_samples_per_second": 33.253,
10
- "eval_steps_per_second": 2.088,
11
- "learning_rate": 1.0000000000000002e-07,
12
  "total_flos": 6.42634409963284e+19,
13
- "train_loss": 0.466335079458891,
14
- "train_runtime": 17194.6751,
15
- "train_samples_per_second": 61.092,
16
- "train_steps_per_second": 3.821
17
  }
 
1
  {
2
  "epoch": 62.0,
3
+ "eval_explained_variance": 0.38597723841667175,
4
+ "eval_kl_divergence": 0.9802031517028809,
5
+ "eval_loss": 0.40605100989341736,
6
+ "eval_mae": 0.1445874720811844,
7
+ "eval_rmse": 0.2018682211637497,
8
+ "eval_runtime": 61.1056,
9
+ "eval_samples_per_second": 38.573,
10
+ "eval_steps_per_second": 2.422,
11
+ "learning_rate": 1.0000000000000002e-06,
12
  "total_flos": 6.42634409963284e+19,
13
+ "train_loss": 0.3985773164651095,
14
+ "train_runtime": 16834.9641,
15
+ "train_samples_per_second": 62.397,
16
+ "train_steps_per_second": 3.903
17
  }
logs/events.out.tfevents.1731607061.datavisu2 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:72264f0fc4d279613f4c668b0e2c37410de44a5ff7887c72102cbccf436418d8
3
- size 50291
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b78d17ccbe193fb3e7efa694c7335abe32b4b424175235d19d34a204ae5784c
3
+ size 52031
logs/events.out.tfevents.1731624072.datavisu2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b9a76e82fce26948e90638211553f83c99a8ff5929731754be2434210688e4f
3
+ size 40
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0b262f1226221e7c2c5a08e8c8c09cef4e400f822c23850ddfddb8cde988f980
3
  size 1222956704
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:25b4e914eff7c60cd1a3ff535322cf77a16ee8e82564b50a87aafed111ab96d8
3
  size 1222956704
test_results.json CHANGED
@@ -1,12 +1,12 @@
1
  {
2
  "epoch": 62.0,
3
- "eval_explained_variance": 0.44826042652130127,
4
- "eval_kl_divergence": 0.41849657893180847,
5
- "eval_loss": 0.4693068563938141,
6
- "eval_mae": 0.11716800183057785,
7
- "eval_rmse": 0.15763415396213531,
8
- "eval_runtime": 70.8809,
9
- "eval_samples_per_second": 33.253,
10
- "eval_steps_per_second": 2.088,
11
- "learning_rate": 1.0000000000000002e-07
12
  }
 
1
  {
2
  "epoch": 62.0,
3
+ "eval_explained_variance": 0.38597723841667175,
4
+ "eval_kl_divergence": 0.9802031517028809,
5
+ "eval_loss": 0.40605100989341736,
6
+ "eval_mae": 0.1445874720811844,
7
+ "eval_rmse": 0.2018682211637497,
8
+ "eval_runtime": 61.1056,
9
+ "eval_samples_per_second": 38.573,
10
+ "eval_steps_per_second": 2.422,
11
+ "learning_rate": 1.0000000000000002e-06
12
  }
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 62.0,
3
- "learning_rate": 1.0000000000000002e-07,
4
  "total_flos": 6.42634409963284e+19,
5
- "train_loss": 0.466335079458891,
6
- "train_runtime": 17194.6751,
7
- "train_samples_per_second": 61.092,
8
- "train_steps_per_second": 3.821
9
  }
 
1
  {
2
  "epoch": 62.0,
3
+ "learning_rate": 1.0000000000000002e-06,
4
  "total_flos": 6.42634409963284e+19,
5
+ "train_loss": 0.3985773164651095,
6
+ "train_runtime": 16834.9641,
7
+ "train_samples_per_second": 62.397,
8
+ "train_steps_per_second": 3.903
9
  }
trainer_state.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "best_metric": 0.466439425945282,
3
  "best_model_checkpoint": "/home/datawork-iot-nos/Seatizen/models/multilabel/drone/drone-DinoVdeau-from-binary-large-2024_11_14-batch-size16_freeze_probs/checkpoint-22776",
4
  "epoch": 62.0,
5
  "eval_steps": 500,
@@ -10,1197 +10,1197 @@
10
  "log_history": [
11
  {
12
  "epoch": 1.0,
13
- "eval_explained_variance": 0.3631434440612793,
14
- "eval_kl_divergence": 0.421912282705307,
15
- "eval_loss": 0.4821413457393646,
16
- "eval_mae": 0.13084472715854645,
17
- "eval_rmse": 0.173090398311615,
18
- "eval_runtime": 64.2475,
19
- "eval_samples_per_second": 36.64,
20
- "eval_steps_per_second": 2.304,
21
  "learning_rate": 0.001,
22
  "step": 438
23
  },
24
  {
25
  "epoch": 1.1415525114155252,
26
- "grad_norm": 0.4757365882396698,
27
  "learning_rate": 0.001,
28
- "loss": 0.5317,
29
  "step": 500
30
  },
31
  {
32
  "epoch": 2.0,
33
- "eval_explained_variance": 0.3752269744873047,
34
- "eval_kl_divergence": 0.6148446202278137,
35
- "eval_loss": 0.4784533977508545,
36
- "eval_mae": 0.12629373371601105,
37
- "eval_rmse": 0.17098082602024078,
38
- "eval_runtime": 63.6833,
39
- "eval_samples_per_second": 36.964,
40
- "eval_steps_per_second": 2.324,
41
  "learning_rate": 0.001,
42
  "step": 876
43
  },
44
  {
45
  "epoch": 2.2831050228310503,
46
- "grad_norm": 0.4254082143306732,
47
  "learning_rate": 0.001,
48
- "loss": 0.4832,
49
  "step": 1000
50
  },
51
  {
52
  "epoch": 3.0,
53
- "eval_explained_variance": 0.3838556706905365,
54
- "eval_kl_divergence": 0.48802560567855835,
55
- "eval_loss": 0.47776785492897034,
56
- "eval_mae": 0.12731628119945526,
57
- "eval_rmse": 0.16985835134983063,
58
- "eval_runtime": 62.637,
59
- "eval_samples_per_second": 37.582,
60
- "eval_steps_per_second": 2.363,
61
  "learning_rate": 0.001,
62
  "step": 1314
63
  },
64
  {
65
  "epoch": 3.4246575342465753,
66
- "grad_norm": 0.2670271098613739,
67
  "learning_rate": 0.001,
68
- "loss": 0.4791,
69
  "step": 1500
70
  },
71
  {
72
  "epoch": 4.0,
73
- "eval_explained_variance": 0.38376739621162415,
74
- "eval_kl_divergence": 0.3418101370334625,
75
- "eval_loss": 0.4793245792388916,
76
- "eval_mae": 0.12901858985424042,
77
- "eval_rmse": 0.171015664935112,
78
- "eval_runtime": 62.2828,
79
- "eval_samples_per_second": 37.795,
80
- "eval_steps_per_second": 2.376,
81
  "learning_rate": 0.001,
82
  "step": 1752
83
  },
84
  {
85
  "epoch": 4.566210045662101,
86
- "grad_norm": 0.20498104393482208,
87
  "learning_rate": 0.001,
88
- "loss": 0.4771,
89
  "step": 2000
90
  },
91
  {
92
  "epoch": 5.0,
93
- "eval_explained_variance": 0.40547776222229004,
94
- "eval_kl_divergence": 0.34562820196151733,
95
- "eval_loss": 0.47521594166755676,
96
- "eval_mae": 0.12799377739429474,
97
- "eval_rmse": 0.16736441850662231,
98
- "eval_runtime": 62.7606,
99
- "eval_samples_per_second": 37.508,
100
- "eval_steps_per_second": 2.358,
101
  "learning_rate": 0.001,
102
  "step": 2190
103
  },
104
  {
105
  "epoch": 5.707762557077626,
106
- "grad_norm": 0.24335043132305145,
107
  "learning_rate": 0.001,
108
- "loss": 0.4752,
109
  "step": 2500
110
  },
111
  {
112
  "epoch": 6.0,
113
- "eval_explained_variance": 0.3849389851093292,
114
- "eval_kl_divergence": 0.6402714848518372,
115
- "eval_loss": 0.478865385055542,
116
- "eval_mae": 0.12540282309055328,
117
- "eval_rmse": 0.17068879306316376,
118
- "eval_runtime": 63.4836,
119
- "eval_samples_per_second": 37.08,
120
- "eval_steps_per_second": 2.331,
121
  "learning_rate": 0.001,
122
  "step": 2628
123
  },
124
  {
125
  "epoch": 6.8493150684931505,
126
- "grad_norm": 0.18768365681171417,
127
  "learning_rate": 0.001,
128
- "loss": 0.4752,
129
  "step": 3000
130
  },
131
  {
132
  "epoch": 7.0,
133
- "eval_explained_variance": 0.3788411617279053,
134
- "eval_kl_divergence": 0.5491646528244019,
135
- "eval_loss": 0.4779475927352905,
136
- "eval_mae": 0.12878474593162537,
137
- "eval_rmse": 0.17091502249240875,
138
- "eval_runtime": 63.4904,
139
- "eval_samples_per_second": 37.076,
140
- "eval_steps_per_second": 2.331,
141
  "learning_rate": 0.001,
142
  "step": 3066
143
  },
144
  {
145
  "epoch": 7.9908675799086755,
146
- "grad_norm": 0.1587909311056137,
147
  "learning_rate": 0.001,
148
- "loss": 0.4735,
149
  "step": 3500
150
  },
151
  {
152
  "epoch": 8.0,
153
- "eval_explained_variance": 0.40857037901878357,
154
- "eval_kl_divergence": 0.33827269077301025,
155
- "eval_loss": 0.4756968021392822,
156
- "eval_mae": 0.12695902585983276,
157
- "eval_rmse": 0.16784566640853882,
158
- "eval_runtime": 64.8792,
159
- "eval_samples_per_second": 36.283,
160
- "eval_steps_per_second": 2.281,
161
  "learning_rate": 0.001,
162
  "step": 3504
163
  },
164
  {
165
  "epoch": 9.0,
166
- "eval_explained_variance": 0.4111548960208893,
167
- "eval_kl_divergence": 0.5417521595954895,
168
- "eval_loss": 0.4731782376766205,
169
- "eval_mae": 0.12311580032110214,
170
- "eval_rmse": 0.1657222956418991,
171
- "eval_runtime": 61.0673,
172
- "eval_samples_per_second": 38.548,
173
- "eval_steps_per_second": 2.424,
174
  "learning_rate": 0.001,
175
  "step": 3942
176
  },
177
  {
178
  "epoch": 9.132420091324201,
179
- "grad_norm": 0.1892658919095993,
180
  "learning_rate": 0.001,
181
- "loss": 0.4719,
182
  "step": 4000
183
  },
184
  {
185
  "epoch": 10.0,
186
- "eval_explained_variance": 0.4006313979625702,
187
- "eval_kl_divergence": 0.15472176671028137,
188
- "eval_loss": 0.4799855649471283,
189
- "eval_mae": 0.1320570707321167,
190
- "eval_rmse": 0.1722680777311325,
191
- "eval_runtime": 62.7974,
192
- "eval_samples_per_second": 37.486,
193
- "eval_steps_per_second": 2.357,
194
  "learning_rate": 0.001,
195
  "step": 4380
196
  },
197
  {
198
  "epoch": 10.273972602739725,
199
- "grad_norm": 0.20271532237529755,
200
  "learning_rate": 0.001,
201
- "loss": 0.4727,
202
  "step": 4500
203
  },
204
  {
205
  "epoch": 11.0,
206
- "eval_explained_variance": 0.41580215096473694,
207
- "eval_kl_divergence": 0.3436921238899231,
208
- "eval_loss": 0.4731641411781311,
209
- "eval_mae": 0.12562014162540436,
210
- "eval_rmse": 0.16564464569091797,
211
- "eval_runtime": 62.757,
212
- "eval_samples_per_second": 37.51,
213
- "eval_steps_per_second": 2.358,
214
  "learning_rate": 0.001,
215
  "step": 4818
216
  },
217
  {
218
  "epoch": 11.415525114155251,
219
- "grad_norm": 0.15571434795856476,
220
  "learning_rate": 0.001,
221
- "loss": 0.4723,
222
  "step": 5000
223
  },
224
  {
225
  "epoch": 12.0,
226
- "eval_explained_variance": 0.401115745306015,
227
- "eval_kl_divergence": 0.2946830093860626,
228
- "eval_loss": 0.47767141461372375,
229
- "eval_mae": 0.12927968800067902,
230
- "eval_rmse": 0.17007046937942505,
231
- "eval_runtime": 62.3684,
232
- "eval_samples_per_second": 37.743,
233
- "eval_steps_per_second": 2.373,
234
  "learning_rate": 0.001,
235
  "step": 5256
236
  },
237
  {
238
  "epoch": 12.557077625570777,
239
- "grad_norm": 0.16043365001678467,
240
  "learning_rate": 0.001,
241
- "loss": 0.4735,
242
  "step": 5500
243
  },
244
  {
245
  "epoch": 13.0,
246
- "eval_explained_variance": 0.39704158902168274,
247
- "eval_kl_divergence": 0.6136478781700134,
248
- "eval_loss": 0.48009705543518066,
249
- "eval_mae": 0.12475714087486267,
250
- "eval_rmse": 0.16773907840251923,
251
- "eval_runtime": 64.7917,
252
- "eval_samples_per_second": 36.332,
253
- "eval_steps_per_second": 2.284,
254
  "learning_rate": 0.001,
255
  "step": 5694
256
  },
257
  {
258
  "epoch": 13.698630136986301,
259
- "grad_norm": 0.13616104423999786,
260
  "learning_rate": 0.001,
261
- "loss": 0.4728,
262
  "step": 6000
263
  },
264
  {
265
  "epoch": 14.0,
266
- "eval_explained_variance": 0.40355002880096436,
267
- "eval_kl_divergence": Infinity,
268
- "eval_loss": 0.4954195022583008,
269
- "eval_mae": 0.12534154951572418,
270
- "eval_rmse": 0.16692323982715607,
271
- "eval_runtime": 62.1801,
272
- "eval_samples_per_second": 37.858,
273
- "eval_steps_per_second": 2.38,
274
  "learning_rate": 0.001,
275
  "step": 6132
276
  },
277
  {
278
  "epoch": 14.840182648401827,
279
- "grad_norm": 0.12133222818374634,
280
  "learning_rate": 0.001,
281
- "loss": 0.4713,
282
  "step": 6500
283
  },
284
  {
285
  "epoch": 15.0,
286
- "eval_explained_variance": 0.4051372706890106,
287
- "eval_kl_divergence": Infinity,
288
- "eval_loss": 0.4812238812446594,
289
- "eval_mae": 0.12540575861930847,
290
- "eval_rmse": 0.16624794900417328,
291
- "eval_runtime": 61.3206,
292
- "eval_samples_per_second": 38.388,
293
- "eval_steps_per_second": 2.414,
294
  "learning_rate": 0.001,
295
  "step": 6570
296
  },
297
  {
298
  "epoch": 15.981735159817351,
299
- "grad_norm": 0.11760278791189194,
300
- "learning_rate": 0.0001,
301
- "loss": 0.4706,
302
  "step": 7000
303
  },
304
  {
305
  "epoch": 16.0,
306
- "eval_explained_variance": 0.41243478655815125,
307
- "eval_kl_divergence": Infinity,
308
- "eval_loss": 0.4858487546443939,
309
- "eval_mae": 0.12432911992073059,
310
- "eval_rmse": 0.16562338173389435,
311
- "eval_runtime": 61.1501,
312
- "eval_samples_per_second": 38.495,
313
- "eval_steps_per_second": 2.42,
314
- "learning_rate": 0.0001,
315
  "step": 7008
316
  },
317
  {
318
  "epoch": 17.0,
319
- "eval_explained_variance": 0.4291960895061493,
320
- "eval_kl_divergence": 0.41650328040122986,
321
- "eval_loss": 0.47084349393844604,
322
- "eval_mae": 0.12233477830886841,
323
- "eval_rmse": 0.162751242518425,
324
- "eval_runtime": 60.5736,
325
- "eval_samples_per_second": 38.862,
326
- "eval_steps_per_second": 2.443,
327
- "learning_rate": 0.0001,
328
  "step": 7446
329
  },
330
  {
331
  "epoch": 17.123287671232877,
332
- "grad_norm": 0.13284093141555786,
333
- "learning_rate": 0.0001,
334
- "loss": 0.4672,
335
  "step": 7500
336
  },
337
  {
338
  "epoch": 18.0,
339
- "eval_explained_variance": 0.43114474415779114,
340
- "eval_kl_divergence": 0.4066373407840729,
341
- "eval_loss": 0.4707622528076172,
342
- "eval_mae": 0.12164173275232315,
343
- "eval_rmse": 0.16261519491672516,
344
- "eval_runtime": 63.1248,
345
- "eval_samples_per_second": 37.291,
346
- "eval_steps_per_second": 2.345,
347
- "learning_rate": 0.0001,
348
  "step": 7884
349
  },
350
  {
351
  "epoch": 18.264840182648403,
352
- "grad_norm": 0.11859569698572159,
353
- "learning_rate": 0.0001,
354
- "loss": 0.4659,
355
  "step": 8000
356
  },
357
  {
358
  "epoch": 19.0,
359
- "eval_explained_variance": 0.4342735707759857,
360
- "eval_kl_divergence": 0.31854644417762756,
361
- "eval_loss": 0.47095733880996704,
362
- "eval_mae": 0.12272538989782333,
363
- "eval_rmse": 0.16323107481002808,
364
- "eval_runtime": 61.7089,
365
- "eval_samples_per_second": 38.147,
366
- "eval_steps_per_second": 2.398,
367
- "learning_rate": 0.0001,
368
  "step": 8322
369
  },
370
  {
371
  "epoch": 19.40639269406393,
372
- "grad_norm": 0.16951066255569458,
373
- "learning_rate": 0.0001,
374
- "loss": 0.4653,
375
  "step": 8500
376
  },
377
  {
378
  "epoch": 20.0,
379
- "eval_explained_variance": 0.43487218022346497,
380
- "eval_kl_divergence": 0.465139240026474,
381
- "eval_loss": 0.4696938395500183,
382
- "eval_mae": 0.12050192803144455,
383
- "eval_rmse": 0.1620241105556488,
384
- "eval_runtime": 61.0162,
385
- "eval_samples_per_second": 38.58,
386
- "eval_steps_per_second": 2.426,
387
- "learning_rate": 0.0001,
388
  "step": 8760
389
  },
390
  {
391
  "epoch": 20.54794520547945,
392
- "grad_norm": 0.167369082570076,
393
- "learning_rate": 0.0001,
394
- "loss": 0.4653,
395
  "step": 9000
396
  },
397
  {
398
  "epoch": 21.0,
399
- "eval_explained_variance": 0.4389828145503998,
400
- "eval_kl_divergence": 0.3772728741168976,
401
- "eval_loss": 0.46922874450683594,
402
- "eval_mae": 0.12155676633119583,
403
- "eval_rmse": 0.16139467060565948,
404
- "eval_runtime": 62.17,
405
- "eval_samples_per_second": 37.864,
406
- "eval_steps_per_second": 2.381,
407
- "learning_rate": 0.0001,
408
  "step": 9198
409
  },
410
  {
411
  "epoch": 21.689497716894977,
412
- "grad_norm": 0.1247042864561081,
413
- "learning_rate": 0.0001,
414
- "loss": 0.4659,
415
  "step": 9500
416
  },
417
  {
418
  "epoch": 22.0,
419
- "eval_explained_variance": 0.43975934386253357,
420
- "eval_kl_divergence": 0.4611187279224396,
421
- "eval_loss": 0.4685634672641754,
422
- "eval_mae": 0.1203194335103035,
423
- "eval_rmse": 0.16088876128196716,
424
- "eval_runtime": 62.0052,
425
- "eval_samples_per_second": 37.965,
426
- "eval_steps_per_second": 2.387,
427
- "learning_rate": 0.0001,
428
  "step": 9636
429
  },
430
  {
431
  "epoch": 22.831050228310502,
432
- "grad_norm": 0.16208066046237946,
433
  "learning_rate": 0.0001,
434
- "loss": 0.465,
435
  "step": 10000
436
  },
437
  {
438
  "epoch": 23.0,
439
- "eval_explained_variance": 0.44279029965400696,
440
- "eval_kl_divergence": 0.24986685812473297,
441
- "eval_loss": 0.47018975019454956,
442
- "eval_mae": 0.12256480008363724,
443
- "eval_rmse": 0.16208301484584808,
444
- "eval_runtime": 61.6543,
445
- "eval_samples_per_second": 38.181,
446
- "eval_steps_per_second": 2.4,
447
  "learning_rate": 0.0001,
448
  "step": 10074
449
  },
450
  {
451
  "epoch": 23.972602739726028,
452
- "grad_norm": 0.17417912185192108,
453
  "learning_rate": 0.0001,
454
- "loss": 0.4633,
455
  "step": 10500
456
  },
457
  {
458
  "epoch": 24.0,
459
- "eval_explained_variance": 0.4367590844631195,
460
- "eval_kl_divergence": 0.3702172040939331,
461
- "eval_loss": 0.4705464243888855,
462
- "eval_mae": 0.12131566554307938,
463
- "eval_rmse": 0.16277877986431122,
464
- "eval_runtime": 62.8273,
465
- "eval_samples_per_second": 37.468,
466
- "eval_steps_per_second": 2.356,
467
  "learning_rate": 0.0001,
468
  "step": 10512
469
  },
470
  {
471
  "epoch": 25.0,
472
- "eval_explained_variance": 0.4433206617832184,
473
- "eval_kl_divergence": 0.5132729411125183,
474
- "eval_loss": 0.4678299129009247,
475
- "eval_mae": 0.11875440925359726,
476
- "eval_rmse": 0.16013289988040924,
477
- "eval_runtime": 61.7077,
478
- "eval_samples_per_second": 38.148,
479
- "eval_steps_per_second": 2.398,
480
  "learning_rate": 0.0001,
481
  "step": 10950
482
  },
483
  {
484
  "epoch": 25.114155251141554,
485
- "grad_norm": 0.13617579638957977,
486
  "learning_rate": 0.0001,
487
- "loss": 0.4656,
488
  "step": 11000
489
  },
490
  {
491
  "epoch": 26.0,
492
- "eval_explained_variance": 0.4423791468143463,
493
- "eval_kl_divergence": 0.5665323138237,
494
- "eval_loss": 0.46802961826324463,
495
- "eval_mae": 0.117874376475811,
496
- "eval_rmse": 0.1604483276605606,
497
- "eval_runtime": 61.9639,
498
- "eval_samples_per_second": 37.99,
499
- "eval_steps_per_second": 2.388,
500
  "learning_rate": 0.0001,
501
  "step": 11388
502
  },
503
  {
504
  "epoch": 26.255707762557076,
505
- "grad_norm": 0.15818916261196136,
506
  "learning_rate": 0.0001,
507
- "loss": 0.4629,
508
  "step": 11500
509
  },
510
  {
511
  "epoch": 27.0,
512
- "eval_explained_variance": 0.4434410333633423,
513
- "eval_kl_divergence": 0.42424070835113525,
514
- "eval_loss": 0.4680938124656677,
515
- "eval_mae": 0.1199984923005104,
516
- "eval_rmse": 0.16038183867931366,
517
- "eval_runtime": 62.3144,
518
- "eval_samples_per_second": 37.776,
519
- "eval_steps_per_second": 2.375,
520
  "learning_rate": 0.0001,
521
  "step": 11826
522
  },
523
  {
524
  "epoch": 27.397260273972602,
525
- "grad_norm": 0.15971983969211578,
526
  "learning_rate": 0.0001,
527
- "loss": 0.4636,
528
  "step": 12000
529
  },
530
  {
531
  "epoch": 28.0,
532
- "eval_explained_variance": 0.44512465596199036,
533
- "eval_kl_divergence": 0.2967982292175293,
534
- "eval_loss": 0.4693257212638855,
535
- "eval_mae": 0.12149528414011002,
536
- "eval_rmse": 0.1616295725107193,
537
- "eval_runtime": 66.789,
538
- "eval_samples_per_second": 35.245,
539
- "eval_steps_per_second": 2.216,
540
  "learning_rate": 0.0001,
541
  "step": 12264
542
  },
543
  {
544
  "epoch": 28.538812785388128,
545
- "grad_norm": 0.15448875725269318,
546
  "learning_rate": 0.0001,
547
- "loss": 0.4633,
548
  "step": 12500
549
  },
550
  {
551
  "epoch": 29.0,
552
- "eval_explained_variance": 0.4442717730998993,
553
- "eval_kl_divergence": 0.3924856185913086,
554
- "eval_loss": 0.46847742795944214,
555
- "eval_mae": 0.1196620985865593,
556
- "eval_rmse": 0.16072382032871246,
557
- "eval_runtime": 61.9086,
558
- "eval_samples_per_second": 38.024,
559
- "eval_steps_per_second": 2.391,
560
  "learning_rate": 0.0001,
561
  "step": 12702
562
  },
563
  {
564
  "epoch": 29.680365296803654,
565
- "grad_norm": 0.15532433986663818,
566
  "learning_rate": 0.0001,
567
- "loss": 0.4631,
568
  "step": 13000
569
  },
570
  {
571
  "epoch": 30.0,
572
- "eval_explained_variance": 0.4473068416118622,
573
- "eval_kl_divergence": 0.2495478093624115,
574
- "eval_loss": 0.46944141387939453,
575
- "eval_mae": 0.12209376692771912,
576
- "eval_rmse": 0.16142255067825317,
577
- "eval_runtime": 62.4285,
578
- "eval_samples_per_second": 37.707,
579
- "eval_steps_per_second": 2.371,
580
  "learning_rate": 0.0001,
581
  "step": 13140
582
  },
583
  {
584
  "epoch": 30.82191780821918,
585
- "grad_norm": 0.1961052566766739,
586
  "learning_rate": 0.0001,
587
- "loss": 0.463,
588
  "step": 13500
589
  },
590
  {
591
  "epoch": 31.0,
592
- "eval_explained_variance": 0.4445982277393341,
593
- "eval_kl_divergence": 0.45099732279777527,
594
- "eval_loss": 0.4678958058357239,
595
- "eval_mae": 0.11854251474142075,
596
- "eval_rmse": 0.16011421382427216,
597
- "eval_runtime": 61.3729,
598
- "eval_samples_per_second": 38.356,
599
- "eval_steps_per_second": 2.411,
600
  "learning_rate": 0.0001,
601
  "step": 13578
602
  },
603
  {
604
  "epoch": 31.963470319634702,
605
- "grad_norm": 0.3346303701400757,
606
- "learning_rate": 1e-05,
607
- "loss": 0.4623,
608
  "step": 14000
609
  },
610
  {
611
  "epoch": 32.0,
612
- "eval_explained_variance": 0.4478188455104828,
613
- "eval_kl_divergence": 0.3885524570941925,
614
- "eval_loss": 0.46778997778892517,
615
- "eval_mae": 0.11933697015047073,
616
- "eval_rmse": 0.16006481647491455,
617
- "eval_runtime": 63.8544,
618
- "eval_samples_per_second": 36.865,
619
- "eval_steps_per_second": 2.318,
620
- "learning_rate": 1e-05,
621
  "step": 14016
622
  },
623
  {
624
  "epoch": 33.0,
625
- "eval_explained_variance": 0.44756200909614563,
626
- "eval_kl_divergence": 0.31322383880615234,
627
- "eval_loss": 0.4686955511569977,
628
- "eval_mae": 0.1201881393790245,
629
- "eval_rmse": 0.16055406630039215,
630
- "eval_runtime": 62.7334,
631
- "eval_samples_per_second": 37.524,
632
- "eval_steps_per_second": 2.359,
633
- "learning_rate": 1e-05,
634
  "step": 14454
635
  },
636
  {
637
  "epoch": 33.10502283105023,
638
- "grad_norm": 0.21087272465229034,
639
- "learning_rate": 1e-05,
640
- "loss": 0.4621,
641
  "step": 14500
642
  },
643
  {
644
  "epoch": 34.0,
645
- "eval_explained_variance": 0.4478868544101715,
646
- "eval_kl_divergence": 0.3957745432853699,
647
- "eval_loss": 0.46784329414367676,
648
- "eval_mae": 0.11951460689306259,
649
- "eval_rmse": 0.1600986272096634,
650
- "eval_runtime": 60.6174,
651
- "eval_samples_per_second": 38.834,
652
- "eval_steps_per_second": 2.442,
653
- "learning_rate": 1e-05,
654
  "step": 14892
655
  },
656
  {
657
  "epoch": 34.24657534246575,
658
- "grad_norm": 0.1875353455543518,
659
- "learning_rate": 1e-05,
660
- "loss": 0.4607,
661
  "step": 15000
662
  },
663
  {
664
  "epoch": 35.0,
665
- "eval_explained_variance": 0.44849491119384766,
666
- "eval_kl_divergence": 0.45786312222480774,
667
- "eval_loss": 0.4671097695827484,
668
- "eval_mae": 0.11800643056631088,
669
- "eval_rmse": 0.15947793424129486,
670
- "eval_runtime": 61.7609,
671
- "eval_samples_per_second": 38.115,
672
- "eval_steps_per_second": 2.396,
673
- "learning_rate": 1e-05,
674
  "step": 15330
675
  },
676
  {
677
  "epoch": 35.38812785388128,
678
- "grad_norm": 0.16752338409423828,
679
- "learning_rate": 1e-05,
680
- "loss": 0.4619,
681
  "step": 15500
682
  },
683
  {
684
  "epoch": 36.0,
685
- "eval_explained_variance": 0.44863569736480713,
686
- "eval_kl_divergence": 0.43913933634757996,
687
- "eval_loss": 0.46735426783561707,
688
- "eval_mae": 0.11842861026525497,
689
- "eval_rmse": 0.15950414538383484,
690
- "eval_runtime": 64.5571,
691
- "eval_samples_per_second": 36.464,
692
- "eval_steps_per_second": 2.293,
693
- "learning_rate": 1e-05,
694
  "step": 15768
695
  },
696
  {
697
  "epoch": 36.529680365296805,
698
- "grad_norm": 0.15660376846790314,
699
- "learning_rate": 1e-05,
700
- "loss": 0.4612,
701
  "step": 16000
702
  },
703
  {
704
  "epoch": 37.0,
705
- "eval_explained_variance": 0.44943228363990784,
706
- "eval_kl_divergence": 0.36332887411117554,
707
- "eval_loss": 0.468018501996994,
708
- "eval_mae": 0.11912700533866882,
709
- "eval_rmse": 0.16002707183361053,
710
- "eval_runtime": 63.2971,
711
- "eval_samples_per_second": 37.19,
712
- "eval_steps_per_second": 2.338,
713
- "learning_rate": 1e-05,
714
  "step": 16206
715
  },
716
  {
717
  "epoch": 37.67123287671233,
718
- "grad_norm": 0.15865331888198853,
719
- "learning_rate": 1e-05,
720
- "loss": 0.4625,
721
  "step": 16500
722
  },
723
  {
724
  "epoch": 38.0,
725
- "eval_explained_variance": 0.45025742053985596,
726
- "eval_kl_divergence": 0.43029093742370605,
727
- "eval_loss": 0.46701580286026,
728
- "eval_mae": 0.1186341941356659,
729
- "eval_rmse": 0.15923398733139038,
730
- "eval_runtime": 63.0229,
731
- "eval_samples_per_second": 37.351,
732
- "eval_steps_per_second": 2.348,
733
- "learning_rate": 1e-05,
734
  "step": 16644
735
  },
736
  {
737
  "epoch": 38.81278538812786,
738
- "grad_norm": 0.2913804352283478,
739
- "learning_rate": 1e-05,
740
- "loss": 0.4608,
741
  "step": 17000
742
  },
743
  {
744
  "epoch": 39.0,
745
- "eval_explained_variance": 0.448851078748703,
746
- "eval_kl_divergence": 0.4562166929244995,
747
- "eval_loss": 0.4673251509666443,
748
- "eval_mae": 0.11870113760232925,
749
- "eval_rmse": 0.1596096307039261,
750
- "eval_runtime": 63.132,
751
- "eval_samples_per_second": 37.287,
752
- "eval_steps_per_second": 2.344,
753
- "learning_rate": 1e-05,
754
  "step": 17082
755
  },
756
  {
757
  "epoch": 39.954337899543376,
758
- "grad_norm": 0.1813182234764099,
759
- "learning_rate": 1e-05,
760
- "loss": 0.4614,
761
  "step": 17500
762
  },
763
  {
764
  "epoch": 40.0,
765
- "eval_explained_variance": 0.449774831533432,
766
- "eval_kl_divergence": 0.40653547644615173,
767
- "eval_loss": 0.4673212468624115,
768
- "eval_mae": 0.1188703179359436,
769
- "eval_rmse": 0.15939703583717346,
770
- "eval_runtime": 65.2215,
771
- "eval_samples_per_second": 36.092,
772
- "eval_steps_per_second": 2.269,
773
- "learning_rate": 1e-05,
774
  "step": 17520
775
  },
776
  {
777
  "epoch": 41.0,
778
- "eval_explained_variance": 0.4507579803466797,
779
- "eval_kl_divergence": 0.3335873782634735,
780
- "eval_loss": 0.4677547216415405,
781
- "eval_mae": 0.12059084326028824,
782
- "eval_rmse": 0.159872904419899,
783
- "eval_runtime": 65.9882,
784
- "eval_samples_per_second": 35.673,
785
- "eval_steps_per_second": 2.243,
786
- "learning_rate": 1e-05,
787
  "step": 17958
788
  },
789
  {
790
  "epoch": 41.0958904109589,
791
- "grad_norm": 0.1584874391555786,
792
- "learning_rate": 1e-05,
793
- "loss": 0.4608,
794
  "step": 18000
795
  },
796
  {
797
  "epoch": 42.0,
798
- "eval_explained_variance": 0.4486294686794281,
799
- "eval_kl_divergence": 0.5311685800552368,
800
- "eval_loss": 0.4671882390975952,
801
- "eval_mae": 0.1177595853805542,
802
- "eval_rmse": 0.15967120230197906,
803
- "eval_runtime": 65.4501,
804
- "eval_samples_per_second": 35.966,
805
- "eval_steps_per_second": 2.261,
806
- "learning_rate": 1e-05,
807
  "step": 18396
808
  },
809
  {
810
  "epoch": 42.23744292237443,
811
- "grad_norm": 0.17140232026576996,
812
- "learning_rate": 1e-05,
813
- "loss": 0.4615,
814
  "step": 18500
815
  },
816
  {
817
  "epoch": 43.0,
818
- "eval_explained_variance": 0.45157137513160706,
819
- "eval_kl_divergence": 0.3923657536506653,
820
- "eval_loss": 0.46716412901878357,
821
- "eval_mae": 0.1185157299041748,
822
- "eval_rmse": 0.1592295914888382,
823
- "eval_runtime": 64.3671,
824
- "eval_samples_per_second": 36.571,
825
- "eval_steps_per_second": 2.299,
826
- "learning_rate": 1e-05,
827
  "step": 18834
828
  },
829
  {
830
  "epoch": 43.37899543378995,
831
- "grad_norm": 0.12803754210472107,
832
- "learning_rate": 1e-05,
833
- "loss": 0.4601,
834
  "step": 19000
835
  },
836
  {
837
  "epoch": 44.0,
838
- "eval_explained_variance": 0.44912728667259216,
839
- "eval_kl_divergence": 0.4258858561515808,
840
- "eval_loss": 0.4678168296813965,
841
- "eval_mae": 0.11944716423749924,
842
- "eval_rmse": 0.16020986437797546,
843
- "eval_runtime": 65.6519,
844
- "eval_samples_per_second": 35.856,
845
- "eval_steps_per_second": 2.254,
846
- "learning_rate": 1e-05,
847
  "step": 19272
848
  },
849
  {
850
  "epoch": 44.52054794520548,
851
- "grad_norm": 0.12536858022212982,
852
- "learning_rate": 1.0000000000000002e-06,
853
- "loss": 0.4602,
854
  "step": 19500
855
  },
856
  {
857
  "epoch": 45.0,
858
- "eval_explained_variance": 0.4489940404891968,
859
- "eval_kl_divergence": 0.5214298367500305,
860
- "eval_loss": 0.46699702739715576,
861
- "eval_mae": 0.11719372868537903,
862
- "eval_rmse": 0.15936775505542755,
863
- "eval_runtime": 64.8181,
864
- "eval_samples_per_second": 36.317,
865
- "eval_steps_per_second": 2.283,
866
- "learning_rate": 1.0000000000000002e-06,
867
  "step": 19710
868
  },
869
  {
870
  "epoch": 45.662100456621005,
871
- "grad_norm": 0.12503549456596375,
872
- "learning_rate": 1.0000000000000002e-06,
873
- "loss": 0.4616,
874
  "step": 20000
875
  },
876
  {
877
  "epoch": 46.0,
878
- "eval_explained_variance": 0.45176592469215393,
879
- "eval_kl_divergence": 0.4174787700176239,
880
- "eval_loss": 0.46712958812713623,
881
- "eval_mae": 0.11880326271057129,
882
- "eval_rmse": 0.1593877524137497,
883
- "eval_runtime": 64.134,
884
- "eval_samples_per_second": 36.704,
885
- "eval_steps_per_second": 2.308,
886
- "learning_rate": 1.0000000000000002e-06,
887
  "step": 20148
888
  },
889
  {
890
  "epoch": 46.80365296803653,
891
- "grad_norm": 0.1746779829263687,
892
- "learning_rate": 1.0000000000000002e-06,
893
- "loss": 0.4602,
894
  "step": 20500
895
  },
896
  {
897
  "epoch": 47.0,
898
- "eval_explained_variance": 0.4524901807308197,
899
- "eval_kl_divergence": 0.4446321427822113,
900
- "eval_loss": 0.4666382074356079,
901
- "eval_mae": 0.11884639412164688,
902
- "eval_rmse": 0.15886224806308746,
903
- "eval_runtime": 68.911,
904
- "eval_samples_per_second": 34.16,
905
- "eval_steps_per_second": 2.148,
906
- "learning_rate": 1.0000000000000002e-06,
907
  "step": 20586
908
  },
909
  {
910
  "epoch": 47.945205479452056,
911
- "grad_norm": 0.18253998458385468,
912
- "learning_rate": 1.0000000000000002e-06,
913
- "loss": 0.4604,
914
  "step": 21000
915
  },
916
  {
917
  "epoch": 48.0,
918
- "eval_explained_variance": 0.44860827922821045,
919
- "eval_kl_divergence": 0.5755118727684021,
920
- "eval_loss": 0.46714723110198975,
921
- "eval_mae": 0.11802936345338821,
922
- "eval_rmse": 0.15972274541854858,
923
- "eval_runtime": 68.5695,
924
- "eval_samples_per_second": 34.33,
925
- "eval_steps_per_second": 2.158,
926
- "learning_rate": 1.0000000000000002e-06,
927
  "step": 21024
928
  },
929
  {
930
  "epoch": 49.0,
931
- "eval_explained_variance": 0.4494647979736328,
932
- "eval_kl_divergence": 0.4303589463233948,
933
- "eval_loss": 0.46758702397346497,
934
- "eval_mae": 0.11922705173492432,
935
- "eval_rmse": 0.15995512902736664,
936
- "eval_runtime": 68.4997,
937
- "eval_samples_per_second": 34.365,
938
- "eval_steps_per_second": 2.161,
939
- "learning_rate": 1.0000000000000002e-06,
940
  "step": 21462
941
  },
942
  {
943
  "epoch": 49.08675799086758,
944
- "grad_norm": 0.1836538463830948,
945
- "learning_rate": 1.0000000000000002e-06,
946
- "loss": 0.4606,
947
  "step": 21500
948
  },
949
  {
950
  "epoch": 50.0,
951
- "eval_explained_variance": 0.4534037411212921,
952
- "eval_kl_divergence": 0.33374354243278503,
953
- "eval_loss": 0.46752068400382996,
954
- "eval_mae": 0.12040751427412033,
955
- "eval_rmse": 0.15945331752300262,
956
- "eval_runtime": 67.7842,
957
- "eval_samples_per_second": 34.728,
958
- "eval_steps_per_second": 2.183,
959
- "learning_rate": 1.0000000000000002e-06,
960
  "step": 21900
961
  },
962
  {
963
  "epoch": 50.22831050228311,
964
- "grad_norm": 0.18452928960323334,
965
- "learning_rate": 1.0000000000000002e-06,
966
- "loss": 0.4598,
967
  "step": 22000
968
  },
969
  {
970
  "epoch": 51.0,
971
- "eval_explained_variance": 0.4523892402648926,
972
- "eval_kl_divergence": 0.395465224981308,
973
- "eval_loss": 0.46691644191741943,
974
- "eval_mae": 0.11809410899877548,
975
- "eval_rmse": 0.1590944528579712,
976
- "eval_runtime": 68.2629,
977
- "eval_samples_per_second": 34.484,
978
- "eval_steps_per_second": 2.168,
979
- "learning_rate": 1.0000000000000002e-06,
980
  "step": 22338
981
  },
982
  {
983
  "epoch": 51.36986301369863,
984
- "grad_norm": 0.1816985160112381,
985
- "learning_rate": 1.0000000000000002e-06,
986
- "loss": 0.4602,
987
  "step": 22500
988
  },
989
  {
990
  "epoch": 52.0,
991
- "eval_explained_variance": 0.45300889015197754,
992
- "eval_kl_divergence": 0.4761090576648712,
993
- "eval_loss": 0.466439425945282,
994
- "eval_mae": 0.1174706444144249,
995
- "eval_rmse": 0.15875311195850372,
996
- "eval_runtime": 68.2396,
997
- "eval_samples_per_second": 34.496,
998
- "eval_steps_per_second": 2.169,
999
- "learning_rate": 1.0000000000000002e-06,
1000
  "step": 22776
1001
  },
1002
  {
1003
  "epoch": 52.51141552511415,
1004
- "grad_norm": 0.17806819081306458,
1005
- "learning_rate": 1.0000000000000002e-06,
1006
- "loss": 0.462,
1007
  "step": 23000
1008
  },
1009
  {
1010
  "epoch": 53.0,
1011
- "eval_explained_variance": 0.45259252190589905,
1012
- "eval_kl_divergence": 0.43274176120758057,
1013
- "eval_loss": 0.4667709469795227,
1014
- "eval_mae": 0.11889918893575668,
1015
- "eval_rmse": 0.15901200473308563,
1016
- "eval_runtime": 66.8799,
1017
- "eval_samples_per_second": 35.197,
1018
- "eval_steps_per_second": 2.213,
1019
- "learning_rate": 1.0000000000000002e-06,
1020
  "step": 23214
1021
  },
1022
  {
1023
  "epoch": 53.65296803652968,
1024
- "grad_norm": 0.18054644763469696,
1025
- "learning_rate": 1.0000000000000002e-06,
1026
- "loss": 0.4604,
1027
  "step": 23500
1028
  },
1029
  {
1030
  "epoch": 54.0,
1031
- "eval_explained_variance": 0.4532507658004761,
1032
- "eval_kl_divergence": 0.3724806606769562,
1033
- "eval_loss": 0.46701404452323914,
1034
- "eval_mae": 0.11868719011545181,
1035
- "eval_rmse": 0.15923155844211578,
1036
- "eval_runtime": 73.556,
1037
- "eval_samples_per_second": 32.003,
1038
- "eval_steps_per_second": 2.012,
1039
- "learning_rate": 1.0000000000000002e-06,
1040
  "step": 23652
1041
  },
1042
  {
1043
  "epoch": 54.794520547945204,
1044
- "grad_norm": 0.26471829414367676,
1045
- "learning_rate": 1.0000000000000002e-06,
1046
- "loss": 0.461,
1047
  "step": 24000
1048
  },
1049
  {
1050
  "epoch": 55.0,
1051
- "eval_explained_variance": 0.45088374614715576,
1052
- "eval_kl_divergence": 0.38409897685050964,
1053
- "eval_loss": 0.467383474111557,
1054
- "eval_mae": 0.11990005522966385,
1055
- "eval_rmse": 0.1595049947500229,
1056
- "eval_runtime": 70.451,
1057
- "eval_samples_per_second": 33.413,
1058
- "eval_steps_per_second": 2.101,
1059
- "learning_rate": 1.0000000000000002e-06,
1060
  "step": 24090
1061
  },
1062
  {
1063
  "epoch": 55.93607305936073,
1064
- "grad_norm": 0.2783886194229126,
1065
- "learning_rate": 1.0000000000000002e-06,
1066
- "loss": 0.4599,
1067
  "step": 24500
1068
  },
1069
  {
1070
  "epoch": 56.0,
1071
- "eval_explained_variance": 0.45115411281585693,
1072
- "eval_kl_divergence": 0.3821828067302704,
1073
- "eval_loss": 0.46739572286605835,
1074
- "eval_mae": 0.11897724121809006,
1075
- "eval_rmse": 0.15964223444461823,
1076
- "eval_runtime": 69.6578,
1077
- "eval_samples_per_second": 33.794,
1078
- "eval_steps_per_second": 2.125,
1079
- "learning_rate": 1.0000000000000002e-06,
1080
  "step": 24528
1081
  },
1082
  {
1083
  "epoch": 57.0,
1084
- "eval_explained_variance": 0.4505263864994049,
1085
- "eval_kl_divergence": 0.4674541652202606,
1086
- "eval_loss": 0.46702033281326294,
1087
- "eval_mae": 0.1185864806175232,
1088
- "eval_rmse": 0.15932416915893555,
1089
- "eval_runtime": 67.4689,
1090
- "eval_samples_per_second": 34.89,
1091
- "eval_steps_per_second": 2.194,
1092
- "learning_rate": 1.0000000000000002e-06,
1093
  "step": 24966
1094
  },
1095
  {
1096
  "epoch": 57.077625570776256,
1097
- "grad_norm": 0.16562320291996002,
1098
- "learning_rate": 1.0000000000000002e-06,
1099
- "loss": 0.4594,
1100
  "step": 25000
1101
  },
1102
  {
1103
  "epoch": 58.0,
1104
- "eval_explained_variance": 0.4521506726741791,
1105
- "eval_kl_divergence": 0.37376847863197327,
1106
- "eval_loss": 0.46735846996307373,
1107
- "eval_mae": 0.11891353130340576,
1108
- "eval_rmse": 0.15956538915634155,
1109
- "eval_runtime": 68.6492,
1110
- "eval_samples_per_second": 34.29,
1111
- "eval_steps_per_second": 2.156,
1112
- "learning_rate": 1.0000000000000002e-06,
1113
  "step": 25404
1114
  },
1115
  {
1116
  "epoch": 58.21917808219178,
1117
- "grad_norm": 0.21171259880065918,
1118
- "learning_rate": 1.0000000000000002e-07,
1119
- "loss": 0.4613,
1120
  "step": 25500
1121
  },
1122
  {
1123
  "epoch": 59.0,
1124
- "eval_explained_variance": 0.45357391238212585,
1125
- "eval_kl_divergence": 0.4204346239566803,
1126
- "eval_loss": 0.46666717529296875,
1127
- "eval_mae": 0.11845538765192032,
1128
- "eval_rmse": 0.1589372605085373,
1129
- "eval_runtime": 69.2012,
1130
- "eval_samples_per_second": 34.017,
1131
- "eval_steps_per_second": 2.139,
1132
- "learning_rate": 1.0000000000000002e-07,
1133
  "step": 25842
1134
  },
1135
  {
1136
  "epoch": 59.36073059360731,
1137
- "grad_norm": 0.1960112601518631,
1138
- "learning_rate": 1.0000000000000002e-07,
1139
- "loss": 0.4607,
1140
  "step": 26000
1141
  },
1142
  {
1143
  "epoch": 60.0,
1144
- "eval_explained_variance": 0.4513193368911743,
1145
- "eval_kl_divergence": 0.45320600271224976,
1146
- "eval_loss": 0.46685320138931274,
1147
- "eval_mae": 0.11779770255088806,
1148
- "eval_rmse": 0.15917657315731049,
1149
- "eval_runtime": 71.4331,
1150
- "eval_samples_per_second": 32.954,
1151
- "eval_steps_per_second": 2.072,
1152
- "learning_rate": 1.0000000000000002e-07,
1153
  "step": 26280
1154
  },
1155
  {
1156
  "epoch": 60.50228310502283,
1157
- "grad_norm": 0.2178792506456375,
1158
- "learning_rate": 1.0000000000000002e-07,
1159
- "loss": 0.4613,
1160
  "step": 26500
1161
  },
1162
  {
1163
  "epoch": 61.0,
1164
- "eval_explained_variance": 0.45110437273979187,
1165
- "eval_kl_divergence": 0.40322577953338623,
1166
- "eval_loss": 0.46734780073165894,
1167
- "eval_mae": 0.11893540620803833,
1168
- "eval_rmse": 0.1595635712146759,
1169
- "eval_runtime": 69.3534,
1170
- "eval_samples_per_second": 33.942,
1171
- "eval_steps_per_second": 2.134,
1172
- "learning_rate": 1.0000000000000002e-07,
1173
  "step": 26718
1174
  },
1175
  {
1176
  "epoch": 61.64383561643836,
1177
- "grad_norm": 0.16740958392620087,
1178
- "learning_rate": 1.0000000000000002e-07,
1179
- "loss": 0.4598,
1180
  "step": 27000
1181
  },
1182
  {
1183
  "epoch": 62.0,
1184
- "eval_explained_variance": 0.4526772201061249,
1185
- "eval_kl_divergence": 0.3406714200973511,
1186
- "eval_loss": 0.4673011302947998,
1187
- "eval_mae": 0.11888447403907776,
1188
- "eval_rmse": 0.1594574898481369,
1189
- "eval_runtime": 70.4024,
1190
- "eval_samples_per_second": 33.436,
1191
- "eval_steps_per_second": 2.102,
1192
- "learning_rate": 1.0000000000000002e-07,
1193
  "step": 27156
1194
  },
1195
  {
1196
  "epoch": 62.0,
1197
- "learning_rate": 1.0000000000000002e-07,
1198
  "step": 27156,
1199
  "total_flos": 6.42634409963284e+19,
1200
- "train_loss": 0.466335079458891,
1201
- "train_runtime": 17194.6751,
1202
- "train_samples_per_second": 61.092,
1203
- "train_steps_per_second": 3.821
1204
  }
1205
  ],
1206
  "logging_steps": 500,
 
1
  {
2
+ "best_metric": 0.40528106689453125,
3
  "best_model_checkpoint": "/home/datawork-iot-nos/Seatizen/models/multilabel/drone/drone-DinoVdeau-from-binary-large-2024_11_14-batch-size16_freeze_probs/checkpoint-22776",
4
  "epoch": 62.0,
5
  "eval_steps": 500,
 
10
  "log_history": [
11
  {
12
  "epoch": 1.0,
13
+ "eval_explained_variance": 0.2881631553173065,
14
+ "eval_kl_divergence": 1.006906509399414,
15
+ "eval_loss": 0.43063807487487793,
16
+ "eval_mae": 0.16208958625793457,
17
+ "eval_rmse": 0.22103922069072723,
18
+ "eval_runtime": 65.2687,
19
+ "eval_samples_per_second": 36.066,
20
+ "eval_steps_per_second": 2.268,
21
  "learning_rate": 0.001,
22
  "step": 438
23
  },
24
  {
25
  "epoch": 1.1415525114155252,
26
+ "grad_norm": 0.5616265535354614,
27
  "learning_rate": 0.001,
28
+ "loss": 0.4808,
29
  "step": 500
30
  },
31
  {
32
  "epoch": 2.0,
33
+ "eval_explained_variance": 0.31177183985710144,
34
+ "eval_kl_divergence": 1.3118820190429688,
35
+ "eval_loss": 0.4245865046977997,
36
+ "eval_mae": 0.15473191440105438,
37
+ "eval_rmse": 0.21785493195056915,
38
+ "eval_runtime": 70.2445,
39
+ "eval_samples_per_second": 33.512,
40
+ "eval_steps_per_second": 2.107,
41
  "learning_rate": 0.001,
42
  "step": 876
43
  },
44
  {
45
  "epoch": 2.2831050228310503,
46
+ "grad_norm": 0.5421963930130005,
47
  "learning_rate": 0.001,
48
+ "loss": 0.421,
49
  "step": 1000
50
  },
51
  {
52
  "epoch": 3.0,
53
+ "eval_explained_variance": 0.3191607892513275,
54
+ "eval_kl_divergence": 1.0982407331466675,
55
+ "eval_loss": 0.422325998544693,
56
+ "eval_mae": 0.1554209440946579,
57
+ "eval_rmse": 0.21583305299282074,
58
+ "eval_runtime": 63.3078,
59
+ "eval_samples_per_second": 37.183,
60
+ "eval_steps_per_second": 2.338,
61
  "learning_rate": 0.001,
62
  "step": 1314
63
  },
64
  {
65
  "epoch": 3.4246575342465753,
66
+ "grad_norm": 0.4156647324562073,
67
  "learning_rate": 0.001,
68
+ "loss": 0.4151,
69
  "step": 1500
70
  },
71
  {
72
  "epoch": 4.0,
73
+ "eval_explained_variance": 0.3350948095321655,
74
+ "eval_kl_divergence": 1.041384220123291,
75
+ "eval_loss": 0.41912660002708435,
76
+ "eval_mae": 0.15517595410346985,
77
+ "eval_rmse": 0.21416835486888885,
78
+ "eval_runtime": 63.7743,
79
+ "eval_samples_per_second": 36.911,
80
+ "eval_steps_per_second": 2.321,
81
  "learning_rate": 0.001,
82
  "step": 1752
83
  },
84
  {
85
  "epoch": 4.566210045662101,
86
+ "grad_norm": 0.2765987813472748,
87
  "learning_rate": 0.001,
88
+ "loss": 0.4114,
89
  "step": 2000
90
  },
91
  {
92
  "epoch": 5.0,
93
+ "eval_explained_variance": 0.33842501044273376,
94
+ "eval_kl_divergence": 1.0698424577713013,
95
+ "eval_loss": 0.41713497042655945,
96
+ "eval_mae": 0.15411676466464996,
97
+ "eval_rmse": 0.21232052147388458,
98
+ "eval_runtime": 61.7723,
99
+ "eval_samples_per_second": 38.108,
100
+ "eval_steps_per_second": 2.396,
101
  "learning_rate": 0.001,
102
  "step": 2190
103
  },
104
  {
105
  "epoch": 5.707762557077626,
106
+ "grad_norm": 0.34299173951148987,
107
  "learning_rate": 0.001,
108
+ "loss": 0.4089,
109
  "step": 2500
110
  },
111
  {
112
  "epoch": 6.0,
113
+ "eval_explained_variance": 0.3310842514038086,
114
+ "eval_kl_divergence": 1.1958788633346558,
115
+ "eval_loss": 0.42093637585639954,
116
+ "eval_mae": 0.1519818753004074,
117
+ "eval_rmse": 0.21403205394744873,
118
+ "eval_runtime": 61.4619,
119
+ "eval_samples_per_second": 38.3,
120
+ "eval_steps_per_second": 2.408,
121
  "learning_rate": 0.001,
122
  "step": 2628
123
  },
124
  {
125
  "epoch": 6.8493150684931505,
126
+ "grad_norm": 0.30921000242233276,
127
  "learning_rate": 0.001,
128
+ "loss": 0.4091,
129
  "step": 3000
130
  },
131
  {
132
  "epoch": 7.0,
133
+ "eval_explained_variance": 0.33822229504585266,
134
+ "eval_kl_divergence": 1.1708621978759766,
135
+ "eval_loss": 0.4166290760040283,
136
+ "eval_mae": 0.153007373213768,
137
+ "eval_rmse": 0.21260716021060944,
138
+ "eval_runtime": 60.3411,
139
+ "eval_samples_per_second": 39.012,
140
+ "eval_steps_per_second": 2.453,
141
  "learning_rate": 0.001,
142
  "step": 3066
143
  },
144
  {
145
  "epoch": 7.9908675799086755,
146
+ "grad_norm": 0.21716275811195374,
147
  "learning_rate": 0.001,
148
+ "loss": 0.4071,
149
  "step": 3500
150
  },
151
  {
152
  "epoch": 8.0,
153
+ "eval_explained_variance": 0.33456894755363464,
154
+ "eval_kl_divergence": 0.971220850944519,
155
+ "eval_loss": 0.41946443915367126,
156
+ "eval_mae": 0.15562371909618378,
157
+ "eval_rmse": 0.2142825573682785,
158
+ "eval_runtime": 62.8353,
159
+ "eval_samples_per_second": 37.463,
160
+ "eval_steps_per_second": 2.355,
161
  "learning_rate": 0.001,
162
  "step": 3504
163
  },
164
  {
165
  "epoch": 9.0,
166
+ "eval_explained_variance": 0.3415004014968872,
167
+ "eval_kl_divergence": 1.1432474851608276,
168
+ "eval_loss": 0.41668570041656494,
169
+ "eval_mae": 0.1524006426334381,
170
+ "eval_rmse": 0.21208135783672333,
171
+ "eval_runtime": 62.325,
172
+ "eval_samples_per_second": 37.77,
173
+ "eval_steps_per_second": 2.375,
174
  "learning_rate": 0.001,
175
  "step": 3942
176
  },
177
  {
178
  "epoch": 9.132420091324201,
179
+ "grad_norm": 0.2371012270450592,
180
  "learning_rate": 0.001,
181
+ "loss": 0.4062,
182
  "step": 4000
183
  },
184
  {
185
  "epoch": 10.0,
186
+ "eval_explained_variance": 0.34203192591667175,
187
+ "eval_kl_divergence": 0.9120630025863647,
188
+ "eval_loss": 0.4186115860939026,
189
+ "eval_mae": 0.15351708233356476,
190
+ "eval_rmse": 0.2138604372739792,
191
+ "eval_runtime": 60.5397,
192
+ "eval_samples_per_second": 38.884,
193
+ "eval_steps_per_second": 2.445,
194
  "learning_rate": 0.001,
195
  "step": 4380
196
  },
197
  {
198
  "epoch": 10.273972602739725,
199
+ "grad_norm": 0.2552158236503601,
200
  "learning_rate": 0.001,
201
+ "loss": 0.4052,
202
  "step": 4500
203
  },
204
  {
205
  "epoch": 11.0,
206
+ "eval_explained_variance": 0.34416234493255615,
207
+ "eval_kl_divergence": 0.995019793510437,
208
+ "eval_loss": 0.41557687520980835,
209
+ "eval_mae": 0.15356659889221191,
210
+ "eval_rmse": 0.2114415019750595,
211
+ "eval_runtime": 61.7293,
212
+ "eval_samples_per_second": 38.134,
213
+ "eval_steps_per_second": 2.398,
214
  "learning_rate": 0.001,
215
  "step": 4818
216
  },
217
  {
218
  "epoch": 11.415525114155251,
219
+ "grad_norm": 0.20953956246376038,
220
  "learning_rate": 0.001,
221
+ "loss": 0.406,
222
  "step": 5000
223
  },
224
  {
225
  "epoch": 12.0,
226
+ "eval_explained_variance": 0.3389909565448761,
227
+ "eval_kl_divergence": 1.0105773210525513,
228
+ "eval_loss": 0.41883811354637146,
229
+ "eval_mae": 0.1555173546075821,
230
+ "eval_rmse": 0.21388684213161469,
231
+ "eval_runtime": 62.5745,
232
+ "eval_samples_per_second": 37.619,
233
+ "eval_steps_per_second": 2.365,
234
  "learning_rate": 0.001,
235
  "step": 5256
236
  },
237
  {
238
  "epoch": 12.557077625570777,
239
+ "grad_norm": 0.18659397959709167,
240
  "learning_rate": 0.001,
241
+ "loss": 0.4058,
242
  "step": 5500
243
  },
244
  {
245
  "epoch": 13.0,
246
+ "eval_explained_variance": 0.34248629212379456,
247
+ "eval_kl_divergence": 1.1481796503067017,
248
+ "eval_loss": 0.41630858182907104,
249
+ "eval_mae": 0.15531976521015167,
250
+ "eval_rmse": 0.21213315427303314,
251
+ "eval_runtime": 61.6003,
252
+ "eval_samples_per_second": 38.214,
253
+ "eval_steps_per_second": 2.403,
254
  "learning_rate": 0.001,
255
  "step": 5694
256
  },
257
  {
258
  "epoch": 13.698630136986301,
259
+ "grad_norm": 0.19523686170578003,
260
  "learning_rate": 0.001,
261
+ "loss": 0.4056,
262
  "step": 6000
263
  },
264
  {
265
  "epoch": 14.0,
266
+ "eval_explained_variance": 0.3286344110965729,
267
+ "eval_kl_divergence": 1.211091160774231,
268
+ "eval_loss": 0.4193180799484253,
269
+ "eval_mae": 0.15458153188228607,
270
+ "eval_rmse": 0.21381880342960358,
271
+ "eval_runtime": 62.0339,
272
+ "eval_samples_per_second": 37.947,
273
+ "eval_steps_per_second": 2.386,
274
  "learning_rate": 0.001,
275
  "step": 6132
276
  },
277
  {
278
  "epoch": 14.840182648401827,
279
+ "grad_norm": 0.18541939556598663,
280
  "learning_rate": 0.001,
281
+ "loss": 0.4033,
282
  "step": 6500
283
  },
284
  {
285
  "epoch": 15.0,
286
+ "eval_explained_variance": 0.3402325212955475,
287
+ "eval_kl_divergence": 1.2042615413665771,
288
+ "eval_loss": 0.416218638420105,
289
+ "eval_mae": 0.15419499576091766,
290
+ "eval_rmse": 0.2121332883834839,
291
+ "eval_runtime": 62.9591,
292
+ "eval_samples_per_second": 37.389,
293
+ "eval_steps_per_second": 2.351,
294
  "learning_rate": 0.001,
295
  "step": 6570
296
  },
297
  {
298
  "epoch": 15.981735159817351,
299
+ "grad_norm": 0.16085268557071686,
300
+ "learning_rate": 0.001,
301
+ "loss": 0.4057,
302
  "step": 7000
303
  },
304
  {
305
  "epoch": 16.0,
306
+ "eval_explained_variance": 0.35001620650291443,
307
+ "eval_kl_divergence": 1.0827727317810059,
308
+ "eval_loss": 0.41389620304107666,
309
+ "eval_mae": 0.1527981460094452,
310
+ "eval_rmse": 0.21022744476795197,
311
+ "eval_runtime": 62.4108,
312
+ "eval_samples_per_second": 37.718,
313
+ "eval_steps_per_second": 2.371,
314
+ "learning_rate": 0.001,
315
  "step": 7008
316
  },
317
  {
318
  "epoch": 17.0,
319
+ "eval_explained_variance": 0.3429690897464752,
320
+ "eval_kl_divergence": 1.0005594491958618,
321
+ "eval_loss": 0.4171081781387329,
322
+ "eval_mae": 0.15638333559036255,
323
+ "eval_rmse": 0.21180683374404907,
324
+ "eval_runtime": 63.4048,
325
+ "eval_samples_per_second": 37.127,
326
+ "eval_steps_per_second": 2.334,
327
+ "learning_rate": 0.001,
328
  "step": 7446
329
  },
330
  {
331
  "epoch": 17.123287671232877,
332
+ "grad_norm": 0.17030780017375946,
333
+ "learning_rate": 0.001,
334
+ "loss": 0.405,
335
  "step": 7500
336
  },
337
  {
338
  "epoch": 18.0,
339
+ "eval_explained_variance": 0.3499327600002289,
340
+ "eval_kl_divergence": 1.0514436960220337,
341
+ "eval_loss": 0.4146382212638855,
342
+ "eval_mae": 0.1507440060377121,
343
+ "eval_rmse": 0.2107054442167282,
344
+ "eval_runtime": 64.4758,
345
+ "eval_samples_per_second": 36.51,
346
+ "eval_steps_per_second": 2.295,
347
+ "learning_rate": 0.001,
348
  "step": 7884
349
  },
350
  {
351
  "epoch": 18.264840182648403,
352
+ "grad_norm": 0.16620762646198273,
353
+ "learning_rate": 0.001,
354
+ "loss": 0.4035,
355
  "step": 8000
356
  },
357
  {
358
  "epoch": 19.0,
359
+ "eval_explained_variance": 0.3467938005924225,
360
+ "eval_kl_divergence": 0.9575299024581909,
361
+ "eval_loss": 0.41857486963272095,
362
+ "eval_mae": 0.1531781703233719,
363
+ "eval_rmse": 0.21135376393795013,
364
+ "eval_runtime": 65.1272,
365
+ "eval_samples_per_second": 36.145,
366
+ "eval_steps_per_second": 2.272,
367
+ "learning_rate": 0.001,
368
  "step": 8322
369
  },
370
  {
371
  "epoch": 19.40639269406393,
372
+ "grad_norm": 0.21431417763233185,
373
+ "learning_rate": 0.001,
374
+ "loss": 0.4031,
375
  "step": 8500
376
  },
377
  {
378
  "epoch": 20.0,
379
+ "eval_explained_variance": 0.34868308901786804,
380
+ "eval_kl_divergence": 1.164780855178833,
381
+ "eval_loss": 0.41434723138809204,
382
+ "eval_mae": 0.15129883587360382,
383
+ "eval_rmse": 0.21083922684192657,
384
+ "eval_runtime": 62.809,
385
+ "eval_samples_per_second": 37.479,
386
+ "eval_steps_per_second": 2.356,
387
+ "learning_rate": 0.001,
388
  "step": 8760
389
  },
390
  {
391
  "epoch": 20.54794520547945,
392
+ "grad_norm": 0.16674350202083588,
393
+ "learning_rate": 0.001,
394
+ "loss": 0.4048,
395
  "step": 9000
396
  },
397
  {
398
  "epoch": 21.0,
399
+ "eval_explained_variance": 0.3385157585144043,
400
+ "eval_kl_divergence": 1.2949873208999634,
401
+ "eval_loss": 0.4195358157157898,
402
+ "eval_mae": 0.15333952009677887,
403
+ "eval_rmse": 0.21233241260051727,
404
+ "eval_runtime": 62.2788,
405
+ "eval_samples_per_second": 37.798,
406
+ "eval_steps_per_second": 2.376,
407
+ "learning_rate": 0.001,
408
  "step": 9198
409
  },
410
  {
411
  "epoch": 21.689497716894977,
412
+ "grad_norm": 0.2121485322713852,
413
+ "learning_rate": 0.001,
414
+ "loss": 0.4055,
415
  "step": 9500
416
  },
417
  {
418
  "epoch": 22.0,
419
+ "eval_explained_variance": 0.34627434611320496,
420
+ "eval_kl_divergence": Infinity,
421
+ "eval_loss": 0.4339658319950104,
422
+ "eval_mae": 0.15240180492401123,
423
+ "eval_rmse": 0.21100641787052155,
424
+ "eval_runtime": 63.2767,
425
+ "eval_samples_per_second": 37.202,
426
+ "eval_steps_per_second": 2.339,
427
+ "learning_rate": 0.001,
428
  "step": 9636
429
  },
430
  {
431
  "epoch": 22.831050228310502,
432
+ "grad_norm": 0.17502234876155853,
433
  "learning_rate": 0.0001,
434
+ "loss": 0.4022,
435
  "step": 10000
436
  },
437
  {
438
  "epoch": 23.0,
439
+ "eval_explained_variance": 0.362075537443161,
440
+ "eval_kl_divergence": NaN,
441
+ "eval_loss": 0.43265336751937866,
442
+ "eval_mae": 0.1517171412706375,
443
+ "eval_rmse": 0.2084527164697647,
444
+ "eval_runtime": 61.7803,
445
+ "eval_samples_per_second": 38.103,
446
+ "eval_steps_per_second": 2.396,
447
  "learning_rate": 0.0001,
448
  "step": 10074
449
  },
450
  {
451
  "epoch": 23.972602739726028,
452
+ "grad_norm": 0.20596392452716827,
453
  "learning_rate": 0.0001,
454
+ "loss": 0.3978,
455
  "step": 10500
456
  },
457
  {
458
  "epoch": 24.0,
459
+ "eval_explained_variance": 0.3582542836666107,
460
+ "eval_kl_divergence": NaN,
461
+ "eval_loss": 0.4384593963623047,
462
+ "eval_mae": 0.14925144612789154,
463
+ "eval_rmse": 0.20924808084964752,
464
+ "eval_runtime": 62.266,
465
+ "eval_samples_per_second": 37.806,
466
+ "eval_steps_per_second": 2.377,
467
  "learning_rate": 0.0001,
468
  "step": 10512
469
  },
470
  {
471
  "epoch": 25.0,
472
+ "eval_explained_variance": 0.3649435043334961,
473
+ "eval_kl_divergence": Infinity,
474
+ "eval_loss": 0.4271779954433441,
475
+ "eval_mae": 0.14897416532039642,
476
+ "eval_rmse": 0.20736177265644073,
477
+ "eval_runtime": 63.0259,
478
+ "eval_samples_per_second": 37.35,
479
+ "eval_steps_per_second": 2.348,
480
  "learning_rate": 0.0001,
481
  "step": 10950
482
  },
483
  {
484
  "epoch": 25.114155251141554,
485
+ "grad_norm": 0.14978627860546112,
486
  "learning_rate": 0.0001,
487
+ "loss": 0.3988,
488
  "step": 11000
489
  },
490
  {
491
  "epoch": 26.0,
492
+ "eval_explained_variance": 0.36444517970085144,
493
+ "eval_kl_divergence": 1.1902661323547363,
494
+ "eval_loss": 0.41048941016197205,
495
+ "eval_mae": 0.148028165102005,
496
+ "eval_rmse": 0.20754428207874298,
497
+ "eval_runtime": 62.2088,
498
+ "eval_samples_per_second": 37.84,
499
+ "eval_steps_per_second": 2.379,
500
  "learning_rate": 0.0001,
501
  "step": 11388
502
  },
503
  {
504
  "epoch": 26.255707762557076,
505
+ "grad_norm": 0.13278695940971375,
506
  "learning_rate": 0.0001,
507
+ "loss": 0.3958,
508
  "step": 11500
509
  },
510
  {
511
  "epoch": 27.0,
512
+ "eval_explained_variance": 0.3687790632247925,
513
+ "eval_kl_divergence": 0.9915334582328796,
514
+ "eval_loss": 0.4096038341522217,
515
+ "eval_mae": 0.1493707150220871,
516
+ "eval_rmse": 0.20674215257167816,
517
+ "eval_runtime": 63.9932,
518
+ "eval_samples_per_second": 36.785,
519
+ "eval_steps_per_second": 2.313,
520
  "learning_rate": 0.0001,
521
  "step": 11826
522
  },
523
  {
524
  "epoch": 27.397260273972602,
525
+ "grad_norm": 0.16862636804580688,
526
  "learning_rate": 0.0001,
527
+ "loss": 0.3965,
528
  "step": 12000
529
  },
530
  {
531
  "epoch": 28.0,
532
+ "eval_explained_variance": 0.3680773675441742,
533
+ "eval_kl_divergence": 0.9668822288513184,
534
+ "eval_loss": 0.4104350507259369,
535
+ "eval_mae": 0.1493188589811325,
536
+ "eval_rmse": 0.20746104419231415,
537
+ "eval_runtime": 64.0647,
538
+ "eval_samples_per_second": 36.744,
539
+ "eval_steps_per_second": 2.31,
540
  "learning_rate": 0.0001,
541
  "step": 12264
542
  },
543
  {
544
  "epoch": 28.538812785388128,
545
+ "grad_norm": 0.16052192449569702,
546
  "learning_rate": 0.0001,
547
+ "loss": 0.396,
548
  "step": 12500
549
  },
550
  {
551
  "epoch": 29.0,
552
+ "eval_explained_variance": 0.3695773184299469,
553
+ "eval_kl_divergence": 1.0432541370391846,
554
+ "eval_loss": 0.40966179966926575,
555
+ "eval_mae": 0.1468651443719864,
556
+ "eval_rmse": 0.20694835484027863,
557
+ "eval_runtime": 63.2767,
558
+ "eval_samples_per_second": 37.202,
559
+ "eval_steps_per_second": 2.339,
560
  "learning_rate": 0.0001,
561
  "step": 12702
562
  },
563
  {
564
  "epoch": 29.680365296803654,
565
+ "grad_norm": 0.14418508112430573,
566
  "learning_rate": 0.0001,
567
+ "loss": 0.3936,
568
  "step": 13000
569
  },
570
  {
571
  "epoch": 30.0,
572
+ "eval_explained_variance": 0.373136430978775,
573
+ "eval_kl_divergence": 0.908222496509552,
574
+ "eval_loss": 0.4094092547893524,
575
+ "eval_mae": 0.14899054169654846,
576
+ "eval_rmse": 0.20645444095134735,
577
+ "eval_runtime": 62.5038,
578
+ "eval_samples_per_second": 37.662,
579
+ "eval_steps_per_second": 2.368,
580
  "learning_rate": 0.0001,
581
  "step": 13140
582
  },
583
  {
584
  "epoch": 30.82191780821918,
585
+ "grad_norm": 0.19649599492549896,
586
  "learning_rate": 0.0001,
587
+ "loss": 0.3944,
588
  "step": 13500
589
  },
590
  {
591
  "epoch": 31.0,
592
+ "eval_explained_variance": 0.3705109655857086,
593
+ "eval_kl_divergence": 1.0120004415512085,
594
+ "eval_loss": 0.40909385681152344,
595
+ "eval_mae": 0.14699043333530426,
596
+ "eval_rmse": 0.20654882490634918,
597
+ "eval_runtime": 63.2971,
598
+ "eval_samples_per_second": 37.19,
599
+ "eval_steps_per_second": 2.338,
600
  "learning_rate": 0.0001,
601
  "step": 13578
602
  },
603
  {
604
  "epoch": 31.963470319634702,
605
+ "grad_norm": 0.228424534201622,
606
+ "learning_rate": 0.0001,
607
+ "loss": 0.3941,
608
  "step": 14000
609
  },
610
  {
611
  "epoch": 32.0,
612
+ "eval_explained_variance": 0.37417080998420715,
613
+ "eval_kl_divergence": 0.9708234071731567,
614
+ "eval_loss": 0.4084269404411316,
615
+ "eval_mae": 0.14826728403568268,
616
+ "eval_rmse": 0.2059999257326126,
617
+ "eval_runtime": 64.3761,
618
+ "eval_samples_per_second": 36.566,
619
+ "eval_steps_per_second": 2.299,
620
+ "learning_rate": 0.0001,
621
  "step": 14016
622
  },
623
  {
624
  "epoch": 33.0,
625
+ "eval_explained_variance": 0.37551748752593994,
626
+ "eval_kl_divergence": 0.9317126870155334,
627
+ "eval_loss": 0.40824124217033386,
628
+ "eval_mae": 0.14738227427005768,
629
+ "eval_rmse": 0.20570062100887299,
630
+ "eval_runtime": 63.4848,
631
+ "eval_samples_per_second": 37.08,
632
+ "eval_steps_per_second": 2.331,
633
+ "learning_rate": 0.0001,
634
  "step": 14454
635
  },
636
  {
637
  "epoch": 33.10502283105023,
638
+ "grad_norm": 0.2595873773097992,
639
+ "learning_rate": 0.0001,
640
+ "loss": 0.3933,
641
  "step": 14500
642
  },
643
  {
644
  "epoch": 34.0,
645
+ "eval_explained_variance": 0.37467464804649353,
646
+ "eval_kl_divergence": 0.9618669748306274,
647
+ "eval_loss": 0.40851354598999023,
648
+ "eval_mae": 0.14805640280246735,
649
+ "eval_rmse": 0.20609329640865326,
650
+ "eval_runtime": 65.3615,
651
+ "eval_samples_per_second": 36.015,
652
+ "eval_steps_per_second": 2.264,
653
+ "learning_rate": 0.0001,
654
  "step": 14892
655
  },
656
  {
657
  "epoch": 34.24657534246575,
658
+ "grad_norm": 0.26568445563316345,
659
+ "learning_rate": 0.0001,
660
+ "loss": 0.3926,
661
  "step": 15000
662
  },
663
  {
664
  "epoch": 35.0,
665
+ "eval_explained_variance": 0.375776082277298,
666
+ "eval_kl_divergence": 1.0522711277008057,
667
+ "eval_loss": 0.4072923958301544,
668
+ "eval_mae": 0.14664247632026672,
669
+ "eval_rmse": 0.20538650453090668,
670
+ "eval_runtime": 64.7697,
671
+ "eval_samples_per_second": 36.344,
672
+ "eval_steps_per_second": 2.285,
673
+ "learning_rate": 0.0001,
674
  "step": 15330
675
  },
676
  {
677
  "epoch": 35.38812785388128,
678
+ "grad_norm": 0.15931576490402222,
679
+ "learning_rate": 0.0001,
680
+ "loss": 0.3936,
681
  "step": 15500
682
  },
683
  {
684
  "epoch": 36.0,
685
+ "eval_explained_variance": 0.3770906925201416,
686
+ "eval_kl_divergence": 1.0621892213821411,
687
+ "eval_loss": 0.40741708874702454,
688
+ "eval_mae": 0.1460237056016922,
689
+ "eval_rmse": 0.20519912242889404,
690
+ "eval_runtime": 64.23,
691
+ "eval_samples_per_second": 36.65,
692
+ "eval_steps_per_second": 2.304,
693
+ "learning_rate": 0.0001,
694
  "step": 15768
695
  },
696
  {
697
  "epoch": 36.529680365296805,
698
+ "grad_norm": 0.22164444625377655,
699
+ "learning_rate": 0.0001,
700
+ "loss": 0.3935,
701
  "step": 16000
702
  },
703
  {
704
  "epoch": 37.0,
705
+ "eval_explained_variance": 0.38024798035621643,
706
+ "eval_kl_divergence": 1.020066261291504,
707
+ "eval_loss": 0.40657544136047363,
708
+ "eval_mae": 0.1456020027399063,
709
+ "eval_rmse": 0.20468135178089142,
710
+ "eval_runtime": 63.8016,
711
+ "eval_samples_per_second": 36.896,
712
+ "eval_steps_per_second": 2.32,
713
+ "learning_rate": 0.0001,
714
  "step": 16206
715
  },
716
  {
717
  "epoch": 37.67123287671233,
718
+ "grad_norm": 0.2097047120332718,
719
+ "learning_rate": 0.0001,
720
+ "loss": 0.3927,
721
  "step": 16500
722
  },
723
  {
724
  "epoch": 38.0,
725
+ "eval_explained_variance": 0.3799835741519928,
726
+ "eval_kl_divergence": 1.0557153224945068,
727
+ "eval_loss": 0.406360387802124,
728
+ "eval_mae": 0.14585663378238678,
729
+ "eval_rmse": 0.20454762876033783,
730
+ "eval_runtime": 63.2021,
731
+ "eval_samples_per_second": 37.246,
732
+ "eval_steps_per_second": 2.342,
733
+ "learning_rate": 0.0001,
734
  "step": 16644
735
  },
736
  {
737
  "epoch": 38.81278538812786,
738
+ "grad_norm": 0.34068891406059265,
739
+ "learning_rate": 0.0001,
740
+ "loss": 0.392,
741
  "step": 17000
742
  },
743
  {
744
  "epoch": 39.0,
745
+ "eval_explained_variance": 0.377095103263855,
746
+ "eval_kl_divergence": 1.005536675453186,
747
+ "eval_loss": 0.4077896773815155,
748
+ "eval_mae": 0.14692139625549316,
749
+ "eval_rmse": 0.2055957317352295,
750
+ "eval_runtime": 62.5136,
751
+ "eval_samples_per_second": 37.656,
752
+ "eval_steps_per_second": 2.367,
753
+ "learning_rate": 0.0001,
754
  "step": 17082
755
  },
756
  {
757
  "epoch": 39.954337899543376,
758
+ "grad_norm": 0.23111671209335327,
759
+ "learning_rate": 0.0001,
760
+ "loss": 0.3915,
761
  "step": 17500
762
  },
763
  {
764
  "epoch": 40.0,
765
+ "eval_explained_variance": 0.38054999709129333,
766
+ "eval_kl_divergence": 0.9849128723144531,
767
+ "eval_loss": 0.4068063199520111,
768
+ "eval_mae": 0.14637430012226105,
769
+ "eval_rmse": 0.20490336418151855,
770
+ "eval_runtime": 62.8552,
771
+ "eval_samples_per_second": 37.451,
772
+ "eval_steps_per_second": 2.355,
773
+ "learning_rate": 0.0001,
774
  "step": 17520
775
  },
776
  {
777
  "epoch": 41.0,
778
+ "eval_explained_variance": 0.3777576982975006,
779
+ "eval_kl_divergence": 0.899895191192627,
780
+ "eval_loss": 0.40890073776245117,
781
+ "eval_mae": 0.1488751471042633,
782
+ "eval_rmse": 0.20631897449493408,
783
+ "eval_runtime": 63.9481,
784
+ "eval_samples_per_second": 36.811,
785
+ "eval_steps_per_second": 2.314,
786
+ "learning_rate": 0.0001,
787
  "step": 17958
788
  },
789
  {
790
  "epoch": 41.0958904109589,
791
+ "grad_norm": 0.28402578830718994,
792
+ "learning_rate": 0.0001,
793
+ "loss": 0.3907,
794
  "step": 18000
795
  },
796
  {
797
  "epoch": 42.0,
798
+ "eval_explained_variance": 0.37971171736717224,
799
+ "eval_kl_divergence": 1.0616570711135864,
800
+ "eval_loss": 0.4068816602230072,
801
+ "eval_mae": 0.14634381234645844,
802
+ "eval_rmse": 0.20491831004619598,
803
+ "eval_runtime": 63.1884,
804
+ "eval_samples_per_second": 37.254,
805
+ "eval_steps_per_second": 2.342,
806
+ "learning_rate": 0.0001,
807
  "step": 18396
808
  },
809
  {
810
  "epoch": 42.23744292237443,
811
+ "grad_norm": 0.24103382229804993,
812
+ "learning_rate": 0.0001,
813
+ "loss": 0.3919,
814
  "step": 18500
815
  },
816
  {
817
  "epoch": 43.0,
818
+ "eval_explained_variance": 0.3829738199710846,
819
+ "eval_kl_divergence": 1.0520097017288208,
820
+ "eval_loss": 0.40578988194465637,
821
+ "eval_mae": 0.14498426020145416,
822
+ "eval_rmse": 0.2040938138961792,
823
+ "eval_runtime": 64.2301,
824
+ "eval_samples_per_second": 36.649,
825
+ "eval_steps_per_second": 2.304,
826
+ "learning_rate": 0.0001,
827
  "step": 18834
828
  },
829
  {
830
  "epoch": 43.37899543378995,
831
+ "grad_norm": 0.3461155891418457,
832
+ "learning_rate": 0.0001,
833
+ "loss": 0.3902,
834
  "step": 19000
835
  },
836
  {
837
  "epoch": 44.0,
838
+ "eval_explained_variance": 0.3809111416339874,
839
+ "eval_kl_divergence": 1.0053679943084717,
840
+ "eval_loss": 0.4070681035518646,
841
+ "eval_mae": 0.14748047292232513,
842
+ "eval_rmse": 0.20503848791122437,
843
+ "eval_runtime": 63.682,
844
+ "eval_samples_per_second": 36.965,
845
+ "eval_steps_per_second": 2.324,
846
+ "learning_rate": 0.0001,
847
  "step": 19272
848
  },
849
  {
850
  "epoch": 44.52054794520548,
851
+ "grad_norm": 0.21600213646888733,
852
+ "learning_rate": 0.0001,
853
+ "loss": 0.3896,
854
  "step": 19500
855
  },
856
  {
857
  "epoch": 45.0,
858
+ "eval_explained_variance": 0.38130107522010803,
859
+ "eval_kl_divergence": 1.13860285282135,
860
+ "eval_loss": 0.40669572353363037,
861
+ "eval_mae": 0.14402073621749878,
862
+ "eval_rmse": 0.2047145813703537,
863
+ "eval_runtime": 61.9143,
864
+ "eval_samples_per_second": 38.02,
865
+ "eval_steps_per_second": 2.39,
866
+ "learning_rate": 0.0001,
867
  "step": 19710
868
  },
869
  {
870
  "epoch": 45.662100456621005,
871
+ "grad_norm": 0.2100251168012619,
872
+ "learning_rate": 0.0001,
873
+ "loss": 0.3925,
874
  "step": 20000
875
  },
876
  {
877
  "epoch": 46.0,
878
+ "eval_explained_variance": 0.3830677270889282,
879
+ "eval_kl_divergence": 1.0252840518951416,
880
+ "eval_loss": 0.40670666098594666,
881
+ "eval_mae": 0.14572028815746307,
882
+ "eval_rmse": 0.20469875633716583,
883
+ "eval_runtime": 61.3533,
884
+ "eval_samples_per_second": 38.368,
885
+ "eval_steps_per_second": 2.412,
886
+ "learning_rate": 0.0001,
887
  "step": 20148
888
  },
889
  {
890
  "epoch": 46.80365296803653,
891
+ "grad_norm": 0.16854612529277802,
892
+ "learning_rate": 0.0001,
893
+ "loss": 0.3896,
894
  "step": 20500
895
  },
896
  {
897
  "epoch": 47.0,
898
+ "eval_explained_variance": 0.3834179639816284,
899
+ "eval_kl_divergence": 1.0430312156677246,
900
+ "eval_loss": 0.4062415659427643,
901
+ "eval_mae": 0.14726205170154572,
902
+ "eval_rmse": 0.20429861545562744,
903
+ "eval_runtime": 62.7532,
904
+ "eval_samples_per_second": 37.512,
905
+ "eval_steps_per_second": 2.358,
906
+ "learning_rate": 0.0001,
907
  "step": 20586
908
  },
909
  {
910
  "epoch": 47.945205479452056,
911
+ "grad_norm": 0.2040056735277176,
912
+ "learning_rate": 0.0001,
913
+ "loss": 0.3902,
914
  "step": 21000
915
  },
916
  {
917
  "epoch": 48.0,
918
+ "eval_explained_variance": 0.38119378685951233,
919
+ "eval_kl_divergence": 1.104145884513855,
920
+ "eval_loss": 0.4064981937408447,
921
+ "eval_mae": 0.14571230113506317,
922
+ "eval_rmse": 0.20479492843151093,
923
+ "eval_runtime": 66.5743,
924
+ "eval_samples_per_second": 35.359,
925
+ "eval_steps_per_second": 2.223,
926
+ "learning_rate": 0.0001,
927
  "step": 21024
928
  },
929
  {
930
  "epoch": 49.0,
931
+ "eval_explained_variance": 0.37976840138435364,
932
+ "eval_kl_divergence": 1.0702213048934937,
933
+ "eval_loss": 0.40709760785102844,
934
+ "eval_mae": 0.14625640213489532,
935
+ "eval_rmse": 0.20520327985286713,
936
+ "eval_runtime": 62.1191,
937
+ "eval_samples_per_second": 37.895,
938
+ "eval_steps_per_second": 2.383,
939
+ "learning_rate": 0.0001,
940
  "step": 21462
941
  },
942
  {
943
  "epoch": 49.08675799086758,
944
+ "grad_norm": 0.2242765724658966,
945
+ "learning_rate": 1e-05,
946
+ "loss": 0.3897,
947
  "step": 21500
948
  },
949
  {
950
  "epoch": 50.0,
951
+ "eval_explained_variance": 0.38569536805152893,
952
+ "eval_kl_divergence": 0.8917386531829834,
953
+ "eval_loss": 0.40644556283950806,
954
+ "eval_mae": 0.1479080468416214,
955
+ "eval_rmse": 0.2042473703622818,
956
+ "eval_runtime": 62.3011,
957
+ "eval_samples_per_second": 37.784,
958
+ "eval_steps_per_second": 2.376,
959
+ "learning_rate": 1e-05,
960
  "step": 21900
961
  },
962
  {
963
  "epoch": 50.22831050228311,
964
+ "grad_norm": 0.21291576325893402,
965
+ "learning_rate": 1e-05,
966
+ "loss": 0.3875,
967
  "step": 22000
968
  },
969
  {
970
  "epoch": 51.0,
971
+ "eval_explained_variance": 0.3844810426235199,
972
+ "eval_kl_divergence": 0.9960101842880249,
973
+ "eval_loss": 0.40579161047935486,
974
+ "eval_mae": 0.14372152090072632,
975
+ "eval_rmse": 0.20405276119709015,
976
+ "eval_runtime": 61.2114,
977
+ "eval_samples_per_second": 38.457,
978
+ "eval_steps_per_second": 2.418,
979
+ "learning_rate": 1e-05,
980
  "step": 22338
981
  },
982
  {
983
  "epoch": 51.36986301369863,
984
+ "grad_norm": 0.24317112565040588,
985
+ "learning_rate": 1e-05,
986
+ "loss": 0.3874,
987
  "step": 22500
988
  },
989
  {
990
  "epoch": 52.0,
991
+ "eval_explained_variance": 0.385125994682312,
992
+ "eval_kl_divergence": 1.0567286014556885,
993
+ "eval_loss": 0.40528106689453125,
994
+ "eval_mae": 0.14458806812763214,
995
+ "eval_rmse": 0.20368923246860504,
996
+ "eval_runtime": 62.8042,
997
+ "eval_samples_per_second": 37.482,
998
+ "eval_steps_per_second": 2.357,
999
+ "learning_rate": 1e-05,
1000
  "step": 22776
1001
  },
1002
  {
1003
  "epoch": 52.51141552511415,
1004
+ "grad_norm": 0.30417612195014954,
1005
+ "learning_rate": 1e-05,
1006
+ "loss": 0.3899,
1007
  "step": 23000
1008
  },
1009
  {
1010
  "epoch": 53.0,
1011
+ "eval_explained_variance": 0.3858625590801239,
1012
+ "eval_kl_divergence": 1.0205212831497192,
1013
+ "eval_loss": 0.4056229293346405,
1014
+ "eval_mae": 0.14624176919460297,
1015
+ "eval_rmse": 0.20387189090251923,
1016
+ "eval_runtime": 62.9117,
1017
+ "eval_samples_per_second": 37.418,
1018
+ "eval_steps_per_second": 2.353,
1019
+ "learning_rate": 1e-05,
1020
  "step": 23214
1021
  },
1022
  {
1023
  "epoch": 53.65296803652968,
1024
+ "grad_norm": 0.24982061982154846,
1025
+ "learning_rate": 1e-05,
1026
+ "loss": 0.3892,
1027
  "step": 23500
1028
  },
1029
  {
1030
  "epoch": 54.0,
1031
+ "eval_explained_variance": 0.3853992521762848,
1032
+ "eval_kl_divergence": 0.9905322194099426,
1033
+ "eval_loss": 0.4058997631072998,
1034
+ "eval_mae": 0.14412301778793335,
1035
+ "eval_rmse": 0.20410750806331635,
1036
+ "eval_runtime": 63.4824,
1037
+ "eval_samples_per_second": 37.081,
1038
+ "eval_steps_per_second": 2.331,
1039
+ "learning_rate": 1e-05,
1040
  "step": 23652
1041
  },
1042
  {
1043
  "epoch": 54.794520547945204,
1044
+ "grad_norm": 0.2903271019458771,
1045
+ "learning_rate": 1e-05,
1046
+ "loss": 0.3892,
1047
  "step": 24000
1048
  },
1049
  {
1050
  "epoch": 55.0,
1051
+ "eval_explained_variance": 0.38560736179351807,
1052
+ "eval_kl_divergence": 0.937917947769165,
1053
+ "eval_loss": 0.4060685932636261,
1054
+ "eval_mae": 0.1471087485551834,
1055
+ "eval_rmse": 0.20407529175281525,
1056
+ "eval_runtime": 64.4026,
1057
+ "eval_samples_per_second": 36.551,
1058
+ "eval_steps_per_second": 2.298,
1059
+ "learning_rate": 1e-05,
1060
  "step": 24090
1061
  },
1062
  {
1063
  "epoch": 55.93607305936073,
1064
+ "grad_norm": 0.2701994776725769,
1065
+ "learning_rate": 1e-05,
1066
+ "loss": 0.3869,
1067
  "step": 24500
1068
  },
1069
  {
1070
  "epoch": 56.0,
1071
+ "eval_explained_variance": 0.3853694200515747,
1072
+ "eval_kl_divergence": 0.9695614576339722,
1073
+ "eval_loss": 0.40592971444129944,
1074
+ "eval_mae": 0.14540034532546997,
1075
+ "eval_rmse": 0.20410047471523285,
1076
+ "eval_runtime": 63.4818,
1077
+ "eval_samples_per_second": 37.081,
1078
+ "eval_steps_per_second": 2.331,
1079
+ "learning_rate": 1e-05,
1080
  "step": 24528
1081
  },
1082
  {
1083
  "epoch": 57.0,
1084
+ "eval_explained_variance": 0.3842361867427826,
1085
+ "eval_kl_divergence": 1.0590680837631226,
1086
+ "eval_loss": 0.4058408737182617,
1087
+ "eval_mae": 0.1459987610578537,
1088
+ "eval_rmse": 0.20412230491638184,
1089
+ "eval_runtime": 62.5651,
1090
+ "eval_samples_per_second": 37.625,
1091
+ "eval_steps_per_second": 2.366,
1092
+ "learning_rate": 1e-05,
1093
  "step": 24966
1094
  },
1095
  {
1096
  "epoch": 57.077625570776256,
1097
+ "grad_norm": 0.20055490732192993,
1098
+ "learning_rate": 1e-05,
1099
+ "loss": 0.3874,
1100
  "step": 25000
1101
  },
1102
  {
1103
  "epoch": 58.0,
1104
+ "eval_explained_variance": 0.38601794838905334,
1105
+ "eval_kl_divergence": 0.9275628328323364,
1106
+ "eval_loss": 0.4063320457935333,
1107
+ "eval_mae": 0.14603658020496368,
1108
+ "eval_rmse": 0.20428447425365448,
1109
+ "eval_runtime": 62.6353,
1110
+ "eval_samples_per_second": 37.583,
1111
+ "eval_steps_per_second": 2.363,
1112
+ "learning_rate": 1e-05,
1113
  "step": 25404
1114
  },
1115
  {
1116
  "epoch": 58.21917808219178,
1117
+ "grad_norm": 0.24670056998729706,
1118
+ "learning_rate": 1.0000000000000002e-06,
1119
+ "loss": 0.3887,
1120
  "step": 25500
1121
  },
1122
  {
1123
  "epoch": 59.0,
1124
+ "eval_explained_variance": 0.3867626488208771,
1125
+ "eval_kl_divergence": 0.9793874621391296,
1126
+ "eval_loss": 0.4056239724159241,
1127
+ "eval_mae": 0.14530591666698456,
1128
+ "eval_rmse": 0.20382745563983917,
1129
+ "eval_runtime": 63.6318,
1130
+ "eval_samples_per_second": 36.994,
1131
+ "eval_steps_per_second": 2.326,
1132
+ "learning_rate": 1.0000000000000002e-06,
1133
  "step": 25842
1134
  },
1135
  {
1136
  "epoch": 59.36073059360731,
1137
+ "grad_norm": 0.27373573184013367,
1138
+ "learning_rate": 1.0000000000000002e-06,
1139
+ "loss": 0.3882,
1140
  "step": 26000
1141
  },
1142
  {
1143
  "epoch": 60.0,
1144
+ "eval_explained_variance": 0.3851200044155121,
1145
+ "eval_kl_divergence": 1.0348856449127197,
1146
+ "eval_loss": 0.40571752190589905,
1147
+ "eval_mae": 0.1446085125207901,
1148
+ "eval_rmse": 0.20402370393276215,
1149
+ "eval_runtime": 63.8531,
1150
+ "eval_samples_per_second": 36.866,
1151
+ "eval_steps_per_second": 2.318,
1152
+ "learning_rate": 1.0000000000000002e-06,
1153
  "step": 26280
1154
  },
1155
  {
1156
  "epoch": 60.50228310502283,
1157
+ "grad_norm": 0.23867332935333252,
1158
+ "learning_rate": 1.0000000000000002e-06,
1159
+ "loss": 0.389,
1160
  "step": 26500
1161
  },
1162
  {
1163
  "epoch": 61.0,
1164
+ "eval_explained_variance": 0.38573384284973145,
1165
+ "eval_kl_divergence": 0.9859956502914429,
1166
+ "eval_loss": 0.4058452248573303,
1167
+ "eval_mae": 0.14494158327579498,
1168
+ "eval_rmse": 0.2040751427412033,
1169
+ "eval_runtime": 61.8751,
1170
+ "eval_samples_per_second": 38.044,
1171
+ "eval_steps_per_second": 2.392,
1172
+ "learning_rate": 1.0000000000000002e-06,
1173
  "step": 26718
1174
  },
1175
  {
1176
  "epoch": 61.64383561643836,
1177
+ "grad_norm": 0.21306726336479187,
1178
+ "learning_rate": 1.0000000000000002e-06,
1179
+ "loss": 0.3882,
1180
  "step": 27000
1181
  },
1182
  {
1183
  "epoch": 62.0,
1184
+ "eval_explained_variance": 0.3864554166793823,
1185
+ "eval_kl_divergence": 0.9528394937515259,
1186
+ "eval_loss": 0.4054276943206787,
1187
+ "eval_mae": 0.14455263316631317,
1188
+ "eval_rmse": 0.20368416607379913,
1189
+ "eval_runtime": 61.7886,
1190
+ "eval_samples_per_second": 38.098,
1191
+ "eval_steps_per_second": 2.395,
1192
+ "learning_rate": 1.0000000000000002e-06,
1193
  "step": 27156
1194
  },
1195
  {
1196
  "epoch": 62.0,
1197
+ "learning_rate": 1.0000000000000002e-06,
1198
  "step": 27156,
1199
  "total_flos": 6.42634409963284e+19,
1200
+ "train_loss": 0.3985773164651095,
1201
+ "train_runtime": 16834.9641,
1202
+ "train_samples_per_second": 62.397,
1203
+ "train_steps_per_second": 3.903
1204
  }
1205
  ],
1206
  "logging_steps": 500,