g8a9 commited on
Commit
c181d3a
1 Parent(s): 5f0655e

End of training

Browse files
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 94.78,
3
+ "eval_accuracy": 0.4534356430493973,
4
+ "eval_loss": 3.1695425510406494,
5
+ "eval_runtime": 128.1811,
6
+ "eval_samples": 24055,
7
+ "eval_samples_per_second": 187.664,
8
+ "eval_steps_per_second": 5.867,
9
+ "perplexity": 23.796596138077017,
10
+ "train_loss": 3.832834072322636,
11
+ "train_runtime": 39205.3218,
12
+ "train_samples": 24910,
13
+ "train_samples_per_second": 63.537,
14
+ "train_steps_per_second": 0.122
15
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 94.78,
3
+ "eval_accuracy": 0.4534356430493973,
4
+ "eval_loss": 3.1695425510406494,
5
+ "eval_runtime": 128.1811,
6
+ "eval_samples": 24055,
7
+ "eval_samples_per_second": 187.664,
8
+ "eval_steps_per_second": 5.867,
9
+ "perplexity": 23.796596138077017
10
+ }
runs/Feb15_12-10-24_monica.sm.unibocconi.it/events.out.tfevents.1676498906.monica.sm.unibocconi.it.747021.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2d8b00fa1c549c3efda41eec6f67bdd1f4bd9f37309a3cc712fb2fcbf5639ba
3
+ size 363
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 94.78,
3
+ "train_loss": 3.832834072322636,
4
+ "train_runtime": 39205.3218,
5
+ "train_samples": 24910,
6
+ "train_samples_per_second": 63.537,
7
+ "train_steps_per_second": 0.122
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,3574 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 3.165127754211426,
3
+ "best_model_checkpoint": "/data1/attanasiog/babylm/roberta-tiny-2l-10M/checkpoint-4400",
4
+ "epoch": 94.78098908156711,
5
+ "global_step": 4550,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.21,
12
+ "learning_rate": 8e-05,
13
+ "loss": 10.5161,
14
+ "step": 10
15
+ },
16
+ {
17
+ "epoch": 0.41,
18
+ "learning_rate": 0.00016,
19
+ "loss": 9.1097,
20
+ "step": 20
21
+ },
22
+ {
23
+ "epoch": 0.62,
24
+ "learning_rate": 0.00024,
25
+ "loss": 7.8514,
26
+ "step": 30
27
+ },
28
+ {
29
+ "epoch": 0.82,
30
+ "learning_rate": 0.00032,
31
+ "loss": 7.3238,
32
+ "step": 40
33
+ },
34
+ {
35
+ "epoch": 1.04,
36
+ "learning_rate": 0.0004,
37
+ "loss": 7.7619,
38
+ "step": 50
39
+ },
40
+ {
41
+ "epoch": 1.04,
42
+ "eval_accuracy": 0.07476398255519703,
43
+ "eval_loss": 7.233829975128174,
44
+ "eval_runtime": 127.6677,
45
+ "eval_samples_per_second": 188.419,
46
+ "eval_steps_per_second": 5.89,
47
+ "step": 50
48
+ },
49
+ {
50
+ "epoch": 1.25,
51
+ "learning_rate": 0.000399995625676045,
52
+ "loss": 7.1368,
53
+ "step": 60
54
+ },
55
+ {
56
+ "epoch": 1.45,
57
+ "learning_rate": 0.0003999825028955268,
58
+ "loss": 6.9374,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 1.66,
63
+ "learning_rate": 0.0003999606322324786,
64
+ "loss": 6.8035,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 1.86,
69
+ "learning_rate": 0.0003999300146435939,
70
+ "loss": 6.6903,
71
+ "step": 90
72
+ },
73
+ {
74
+ "epoch": 2.08,
75
+ "learning_rate": 0.00039989065146818525,
76
+ "loss": 7.0524,
77
+ "step": 100
78
+ },
79
+ {
80
+ "epoch": 2.08,
81
+ "eval_accuracy": 0.1331419040979615,
82
+ "eval_loss": 6.625187873840332,
83
+ "eval_runtime": 127.7949,
84
+ "eval_samples_per_second": 188.231,
85
+ "eval_steps_per_second": 5.884,
86
+ "step": 100
87
+ },
88
+ {
89
+ "epoch": 2.29,
90
+ "learning_rate": 0.0003998425444281255,
91
+ "loss": 6.5528,
92
+ "step": 110
93
+ },
94
+ {
95
+ "epoch": 2.49,
96
+ "learning_rate": 0.00039978569562777234,
97
+ "loss": 6.5145,
98
+ "step": 120
99
+ },
100
+ {
101
+ "epoch": 2.7,
102
+ "learning_rate": 0.0003997201075538765,
103
+ "loss": 6.4642,
104
+ "step": 130
105
+ },
106
+ {
107
+ "epoch": 2.9,
108
+ "learning_rate": 0.0003996457830754729,
109
+ "loss": 6.4561,
110
+ "step": 140
111
+ },
112
+ {
113
+ "epoch": 3.12,
114
+ "learning_rate": 0.00039956272544375493,
115
+ "loss": 6.8423,
116
+ "step": 150
117
+ },
118
+ {
119
+ "epoch": 3.12,
120
+ "eval_accuracy": 0.14625706249076864,
121
+ "eval_loss": 6.462179183959961,
122
+ "eval_runtime": 127.8455,
123
+ "eval_samples_per_second": 188.157,
124
+ "eval_steps_per_second": 5.882,
125
+ "step": 150
126
+ },
127
+ {
128
+ "epoch": 3.33,
129
+ "learning_rate": 0.00039947093829193245,
130
+ "loss": 6.3841,
131
+ "step": 160
132
+ },
133
+ {
134
+ "epoch": 3.53,
135
+ "learning_rate": 0.00039937042563507283,
136
+ "loss": 6.3875,
137
+ "step": 170
138
+ },
139
+ {
140
+ "epoch": 3.74,
141
+ "learning_rate": 0.00039926119186992537,
142
+ "loss": 6.3843,
143
+ "step": 180
144
+ },
145
+ {
146
+ "epoch": 3.95,
147
+ "learning_rate": 0.0003991432417747288,
148
+ "loss": 6.3505,
149
+ "step": 190
150
+ },
151
+ {
152
+ "epoch": 4.16,
153
+ "learning_rate": 0.0003990165805090023,
154
+ "loss": 6.7298,
155
+ "step": 200
156
+ },
157
+ {
158
+ "epoch": 4.16,
159
+ "eval_accuracy": 0.1487513607434242,
160
+ "eval_loss": 6.397099018096924,
161
+ "eval_runtime": 127.9548,
162
+ "eval_samples_per_second": 187.996,
163
+ "eval_steps_per_second": 5.877,
164
+ "step": 200
165
+ },
166
+ {
167
+ "epoch": 4.37,
168
+ "learning_rate": 0.00039888121361332003,
169
+ "loss": 6.3075,
170
+ "step": 210
171
+ },
172
+ {
173
+ "epoch": 4.58,
174
+ "learning_rate": 0.0003987371470090686,
175
+ "loss": 6.3248,
176
+ "step": 220
177
+ },
178
+ {
179
+ "epoch": 4.78,
180
+ "learning_rate": 0.00039858438699818784,
181
+ "loss": 6.2949,
182
+ "step": 230
183
+ },
184
+ {
185
+ "epoch": 4.99,
186
+ "learning_rate": 0.0003984229402628956,
187
+ "loss": 6.2768,
188
+ "step": 240
189
+ },
190
+ {
191
+ "epoch": 5.21,
192
+ "learning_rate": 0.00039825281386539503,
193
+ "loss": 6.669,
194
+ "step": 250
195
+ },
196
+ {
197
+ "epoch": 5.21,
198
+ "eval_accuracy": 0.15192705186935002,
199
+ "eval_loss": 6.362815856933594,
200
+ "eval_runtime": 127.8986,
201
+ "eval_samples_per_second": 188.079,
202
+ "eval_steps_per_second": 5.88,
203
+ "step": 250
204
+ },
205
+ {
206
+ "epoch": 5.41,
207
+ "learning_rate": 0.000398074015247566,
208
+ "loss": 6.257,
209
+ "step": 260
210
+ },
211
+ {
212
+ "epoch": 5.62,
213
+ "learning_rate": 0.0003978865522306392,
214
+ "loss": 6.2485,
215
+ "step": 270
216
+ },
217
+ {
218
+ "epoch": 5.82,
219
+ "learning_rate": 0.0003976904330148543,
220
+ "loss": 6.252,
221
+ "step": 280
222
+ },
223
+ {
224
+ "epoch": 6.04,
225
+ "learning_rate": 0.00039748566617910113,
226
+ "loss": 6.6549,
227
+ "step": 290
228
+ },
229
+ {
230
+ "epoch": 6.25,
231
+ "learning_rate": 0.0003972722606805445,
232
+ "loss": 6.2038,
233
+ "step": 300
234
+ },
235
+ {
236
+ "epoch": 6.25,
237
+ "eval_accuracy": 0.15184154910887893,
238
+ "eval_loss": 6.337147235870361,
239
+ "eval_runtime": 128.0686,
240
+ "eval_samples_per_second": 187.829,
241
+ "eval_steps_per_second": 5.872,
242
+ "step": 300
243
+ },
244
+ {
245
+ "epoch": 6.45,
246
+ "learning_rate": 0.00039705022585423216,
247
+ "loss": 6.2262,
248
+ "step": 310
249
+ },
250
+ {
251
+ "epoch": 6.66,
252
+ "learning_rate": 0.0003968195714126868,
253
+ "loss": 6.2023,
254
+ "step": 320
255
+ },
256
+ {
257
+ "epoch": 6.86,
258
+ "learning_rate": 0.00039658030744548075,
259
+ "loss": 6.2053,
260
+ "step": 330
261
+ },
262
+ {
263
+ "epoch": 7.08,
264
+ "learning_rate": 0.0003963324444187952,
265
+ "loss": 6.6064,
266
+ "step": 340
267
+ },
268
+ {
269
+ "epoch": 7.29,
270
+ "learning_rate": 0.0003960759931749619,
271
+ "loss": 6.1783,
272
+ "step": 350
273
+ },
274
+ {
275
+ "epoch": 7.29,
276
+ "eval_accuracy": 0.15316276995162847,
277
+ "eval_loss": 6.311531066894531,
278
+ "eval_runtime": 127.9978,
279
+ "eval_samples_per_second": 187.933,
280
+ "eval_steps_per_second": 5.875,
281
+ "step": 350
282
+ },
283
+ {
284
+ "epoch": 7.49,
285
+ "learning_rate": 0.00039581096493198893,
286
+ "loss": 6.178,
287
+ "step": 360
288
+ },
289
+ {
290
+ "epoch": 7.7,
291
+ "learning_rate": 0.0003955373712830703,
292
+ "loss": 6.1784,
293
+ "step": 370
294
+ },
295
+ {
296
+ "epoch": 7.9,
297
+ "learning_rate": 0.00039525522419607854,
298
+ "loss": 6.1739,
299
+ "step": 380
300
+ },
301
+ {
302
+ "epoch": 8.12,
303
+ "learning_rate": 0.0003949645360130412,
304
+ "loss": 6.5644,
305
+ "step": 390
306
+ },
307
+ {
308
+ "epoch": 8.33,
309
+ "learning_rate": 0.0003946653194496012,
310
+ "loss": 6.1459,
311
+ "step": 400
312
+ },
313
+ {
314
+ "epoch": 8.33,
315
+ "eval_accuracy": 0.15298167344807118,
316
+ "eval_loss": 6.292238712310791,
317
+ "eval_runtime": 218.0011,
318
+ "eval_samples_per_second": 110.343,
319
+ "eval_steps_per_second": 3.45,
320
+ "step": 400
321
+ },
322
+ {
323
+ "epoch": 8.53,
324
+ "learning_rate": 0.00039435758759446025,
325
+ "loss": 6.1514,
326
+ "step": 410
327
+ },
328
+ {
329
+ "epoch": 8.74,
330
+ "learning_rate": 0.00039404135390880664,
331
+ "loss": 6.1335,
332
+ "step": 420
333
+ },
334
+ {
335
+ "epoch": 8.95,
336
+ "learning_rate": 0.0003937166322257262,
337
+ "loss": 6.1613,
338
+ "step": 430
339
+ },
340
+ {
341
+ "epoch": 9.16,
342
+ "learning_rate": 0.00039338343674959745,
343
+ "loss": 6.5555,
344
+ "step": 440
345
+ },
346
+ {
347
+ "epoch": 9.37,
348
+ "learning_rate": 0.00039304178205546976,
349
+ "loss": 6.1096,
350
+ "step": 450
351
+ },
352
+ {
353
+ "epoch": 9.37,
354
+ "eval_accuracy": 0.15364433508531855,
355
+ "eval_loss": 6.269557952880859,
356
+ "eval_runtime": 245.4855,
357
+ "eval_samples_per_second": 97.989,
358
+ "eval_steps_per_second": 3.063,
359
+ "step": 450
360
+ },
361
+ {
362
+ "epoch": 9.58,
363
+ "learning_rate": 0.00039269168308842634,
364
+ "loss": 6.1131,
365
+ "step": 460
366
+ },
367
+ {
368
+ "epoch": 9.78,
369
+ "learning_rate": 0.00039233315516293006,
370
+ "loss": 6.1172,
371
+ "step": 470
372
+ },
373
+ {
374
+ "epoch": 9.99,
375
+ "learning_rate": 0.00039196621396215403,
376
+ "loss": 6.0984,
377
+ "step": 480
378
+ },
379
+ {
380
+ "epoch": 10.21,
381
+ "learning_rate": 0.000391590875537295,
382
+ "loss": 6.494,
383
+ "step": 490
384
+ },
385
+ {
386
+ "epoch": 10.41,
387
+ "learning_rate": 0.00039120715630687155,
388
+ "loss": 6.0745,
389
+ "step": 500
390
+ },
391
+ {
392
+ "epoch": 10.41,
393
+ "eval_accuracy": 0.15413396308915142,
394
+ "eval_loss": 6.25447416305542,
395
+ "eval_runtime": 243.2903,
396
+ "eval_samples_per_second": 98.874,
397
+ "eval_steps_per_second": 3.091,
398
+ "step": 500
399
+ },
400
+ {
401
+ "epoch": 10.62,
402
+ "learning_rate": 0.000390815073056006,
403
+ "loss": 6.0953,
404
+ "step": 510
405
+ },
406
+ {
407
+ "epoch": 10.82,
408
+ "learning_rate": 0.00039041464293568983,
409
+ "loss": 6.0869,
410
+ "step": 520
411
+ },
412
+ {
413
+ "epoch": 11.04,
414
+ "learning_rate": 0.00039000588346203374,
415
+ "loss": 6.4846,
416
+ "step": 530
417
+ },
418
+ {
419
+ "epoch": 11.25,
420
+ "learning_rate": 0.0003895888125155014,
421
+ "loss": 6.0673,
422
+ "step": 540
423
+ },
424
+ {
425
+ "epoch": 11.45,
426
+ "learning_rate": 0.00038916344834012695,
427
+ "loss": 6.0689,
428
+ "step": 550
429
+ },
430
+ {
431
+ "epoch": 11.45,
432
+ "eval_accuracy": 0.15334541266013718,
433
+ "eval_loss": 6.24962854385376,
434
+ "eval_runtime": 238.4372,
435
+ "eval_samples_per_second": 100.886,
436
+ "eval_steps_per_second": 3.154,
437
+ "step": 550
438
+ },
439
+ {
440
+ "epoch": 11.66,
441
+ "learning_rate": 0.00038872980954271757,
442
+ "loss": 6.0805,
443
+ "step": 560
444
+ },
445
+ {
446
+ "epoch": 11.86,
447
+ "learning_rate": 0.00038828791509203895,
448
+ "loss": 6.0632,
449
+ "step": 570
450
+ },
451
+ {
452
+ "epoch": 12.08,
453
+ "learning_rate": 0.00038783778431798597,
454
+ "loss": 6.4656,
455
+ "step": 580
456
+ },
457
+ {
458
+ "epoch": 12.29,
459
+ "learning_rate": 0.0003873794369107369,
460
+ "loss": 6.0445,
461
+ "step": 590
462
+ },
463
+ {
464
+ "epoch": 12.49,
465
+ "learning_rate": 0.0003869128929198922,
466
+ "loss": 6.0562,
467
+ "step": 600
468
+ },
469
+ {
470
+ "epoch": 12.49,
471
+ "eval_accuracy": 0.15423183376396205,
472
+ "eval_loss": 6.231264114379883,
473
+ "eval_runtime": 244.3792,
474
+ "eval_samples_per_second": 98.433,
475
+ "eval_steps_per_second": 3.077,
476
+ "step": 600
477
+ },
478
+ {
479
+ "epoch": 12.7,
480
+ "learning_rate": 0.0003864381727535973,
481
+ "loss": 6.0553,
482
+ "step": 610
483
+ },
484
+ {
485
+ "epoch": 12.9,
486
+ "learning_rate": 0.00038595529717765027,
487
+ "loss": 6.0595,
488
+ "step": 620
489
+ },
490
+ {
491
+ "epoch": 13.12,
492
+ "learning_rate": 0.0003854642873145931,
493
+ "loss": 6.445,
494
+ "step": 630
495
+ },
496
+ {
497
+ "epoch": 13.33,
498
+ "learning_rate": 0.00038496516464278776,
499
+ "loss": 6.0285,
500
+ "step": 640
501
+ },
502
+ {
503
+ "epoch": 13.53,
504
+ "learning_rate": 0.00038445795099547697,
505
+ "loss": 6.0324,
506
+ "step": 650
507
+ },
508
+ {
509
+ "epoch": 13.53,
510
+ "eval_accuracy": 0.15358873091243086,
511
+ "eval_loss": 6.224751949310303,
512
+ "eval_runtime": 128.0464,
513
+ "eval_samples_per_second": 187.862,
514
+ "eval_steps_per_second": 5.873,
515
+ "step": 650
516
+ },
517
+ {
518
+ "epoch": 13.74,
519
+ "learning_rate": 0.0003839426685598287,
520
+ "loss": 6.0219,
521
+ "step": 660
522
+ },
523
+ {
524
+ "epoch": 13.95,
525
+ "learning_rate": 0.000383419339875966,
526
+ "loss": 6.0379,
527
+ "step": 670
528
+ },
529
+ {
530
+ "epoch": 14.16,
531
+ "learning_rate": 0.00038288798783598087,
532
+ "loss": 6.416,
533
+ "step": 680
534
+ },
535
+ {
536
+ "epoch": 14.37,
537
+ "learning_rate": 0.0003823486356829329,
538
+ "loss": 5.9984,
539
+ "step": 690
540
+ },
541
+ {
542
+ "epoch": 14.58,
543
+ "learning_rate": 0.0003818013070098325,
544
+ "loss": 5.9907,
545
+ "step": 700
546
+ },
547
+ {
548
+ "epoch": 14.58,
549
+ "eval_accuracy": 0.15438304013536042,
550
+ "eval_loss": 6.217936038970947,
551
+ "eval_runtime": 128.0604,
552
+ "eval_samples_per_second": 187.841,
553
+ "eval_steps_per_second": 5.872,
554
+ "step": 700
555
+ },
556
+ {
557
+ "epoch": 14.78,
558
+ "learning_rate": 0.0003812460257586089,
559
+ "loss": 6.0038,
560
+ "step": 710
561
+ },
562
+ {
563
+ "epoch": 14.99,
564
+ "learning_rate": 0.000380682816219063,
565
+ "loss": 6.0321,
566
+ "step": 720
567
+ },
568
+ {
569
+ "epoch": 15.21,
570
+ "learning_rate": 0.00038011170302780446,
571
+ "loss": 6.3685,
572
+ "step": 730
573
+ },
574
+ {
575
+ "epoch": 15.41,
576
+ "learning_rate": 0.00037953271116717444,
577
+ "loss": 5.9825,
578
+ "step": 740
579
+ },
580
+ {
581
+ "epoch": 15.62,
582
+ "learning_rate": 0.0003789458659641527,
583
+ "loss": 5.9683,
584
+ "step": 750
585
+ },
586
+ {
587
+ "epoch": 15.62,
588
+ "eval_accuracy": 0.1545422454380471,
589
+ "eval_loss": 6.183169364929199,
590
+ "eval_runtime": 128.0339,
591
+ "eval_samples_per_second": 187.88,
592
+ "eval_steps_per_second": 5.873,
593
+ "step": 750
594
+ },
595
+ {
596
+ "epoch": 15.82,
597
+ "learning_rate": 0.0003783511930892495,
598
+ "loss": 5.9712,
599
+ "step": 760
600
+ },
601
+ {
602
+ "epoch": 16.04,
603
+ "learning_rate": 0.00037774871855538275,
604
+ "loss": 6.355,
605
+ "step": 770
606
+ },
607
+ {
608
+ "epoch": 16.25,
609
+ "learning_rate": 0.00037713846871674045,
610
+ "loss": 5.9361,
611
+ "step": 780
612
+ },
613
+ {
614
+ "epoch": 16.45,
615
+ "learning_rate": 0.0003765204702676274,
616
+ "loss": 5.9281,
617
+ "step": 790
618
+ },
619
+ {
620
+ "epoch": 16.66,
621
+ "learning_rate": 0.0003758947502412978,
622
+ "loss": 5.9236,
623
+ "step": 800
624
+ },
625
+ {
626
+ "epoch": 16.66,
627
+ "eval_accuracy": 0.15502017774172816,
628
+ "eval_loss": 6.141255855560303,
629
+ "eval_runtime": 128.0268,
630
+ "eval_samples_per_second": 187.89,
631
+ "eval_steps_per_second": 5.874,
632
+ "step": 800
633
+ },
634
+ {
635
+ "epoch": 16.86,
636
+ "learning_rate": 0.0003752613360087727,
637
+ "loss": 5.9288,
638
+ "step": 810
639
+ },
640
+ {
641
+ "epoch": 17.08,
642
+ "learning_rate": 0.00037462025527764265,
643
+ "loss": 6.2842,
644
+ "step": 820
645
+ },
646
+ {
647
+ "epoch": 17.29,
648
+ "learning_rate": 0.00037397153609085553,
649
+ "loss": 5.8852,
650
+ "step": 830
651
+ },
652
+ {
653
+ "epoch": 17.49,
654
+ "learning_rate": 0.0003733152068254901,
655
+ "loss": 5.8779,
656
+ "step": 840
657
+ },
658
+ {
659
+ "epoch": 17.7,
660
+ "learning_rate": 0.00037265129619151483,
661
+ "loss": 5.8808,
662
+ "step": 850
663
+ },
664
+ {
665
+ "epoch": 17.7,
666
+ "eval_accuracy": 0.15577406052421716,
667
+ "eval_loss": 6.089950084686279,
668
+ "eval_runtime": 128.2202,
669
+ "eval_samples_per_second": 187.607,
670
+ "eval_steps_per_second": 5.865,
671
+ "step": 850
672
+ },
673
+ {
674
+ "epoch": 17.9,
675
+ "learning_rate": 0.00037197983323053143,
676
+ "loss": 5.871,
677
+ "step": 860
678
+ },
679
+ {
680
+ "epoch": 18.12,
681
+ "learning_rate": 0.00037130084731450515,
682
+ "loss": 6.2293,
683
+ "step": 870
684
+ },
685
+ {
686
+ "epoch": 18.33,
687
+ "learning_rate": 0.0003706143681444795,
688
+ "loss": 5.8282,
689
+ "step": 880
690
+ },
691
+ {
692
+ "epoch": 18.53,
693
+ "learning_rate": 0.0003699204257492774,
694
+ "loss": 5.85,
695
+ "step": 890
696
+ },
697
+ {
698
+ "epoch": 18.74,
699
+ "learning_rate": 0.0003692190504841871,
700
+ "loss": 5.8392,
701
+ "step": 900
702
+ },
703
+ {
704
+ "epoch": 18.74,
705
+ "eval_accuracy": 0.15657369591332176,
706
+ "eval_loss": 6.054327487945557,
707
+ "eval_runtime": 128.0437,
708
+ "eval_samples_per_second": 187.866,
709
+ "eval_steps_per_second": 5.873,
710
+ "step": 900
711
+ },
712
+ {
713
+ "epoch": 18.95,
714
+ "learning_rate": 0.00036851027302963493,
715
+ "loss": 5.8393,
716
+ "step": 910
717
+ },
718
+ {
719
+ "epoch": 19.16,
720
+ "learning_rate": 0.00036779412438984294,
721
+ "loss": 6.1961,
722
+ "step": 920
723
+ },
724
+ {
725
+ "epoch": 19.37,
726
+ "learning_rate": 0.0003670706358914725,
727
+ "loss": 5.8161,
728
+ "step": 930
729
+ },
730
+ {
731
+ "epoch": 19.58,
732
+ "learning_rate": 0.0003663398391822543,
733
+ "loss": 5.7886,
734
+ "step": 940
735
+ },
736
+ {
737
+ "epoch": 19.78,
738
+ "learning_rate": 0.00036560176622960403,
739
+ "loss": 5.7962,
740
+ "step": 950
741
+ },
742
+ {
743
+ "epoch": 19.78,
744
+ "eval_accuracy": 0.15750512966626293,
745
+ "eval_loss": 6.022204399108887,
746
+ "eval_runtime": 127.9436,
747
+ "eval_samples_per_second": 188.012,
748
+ "eval_steps_per_second": 5.878,
749
+ "step": 950
750
+ },
751
+ {
752
+ "epoch": 19.99,
753
+ "learning_rate": 0.00036485644931922353,
754
+ "loss": 5.7823,
755
+ "step": 960
756
+ },
757
+ {
758
+ "epoch": 20.21,
759
+ "learning_rate": 0.0003641039210536889,
760
+ "loss": 6.1533,
761
+ "step": 970
762
+ },
763
+ {
764
+ "epoch": 20.41,
765
+ "learning_rate": 0.0003633442143510245,
766
+ "loss": 5.7526,
767
+ "step": 980
768
+ },
769
+ {
770
+ "epoch": 20.62,
771
+ "learning_rate": 0.00036257736244326246,
772
+ "loss": 5.7454,
773
+ "step": 990
774
+ },
775
+ {
776
+ "epoch": 20.82,
777
+ "learning_rate": 0.0003618033988749895,
778
+ "loss": 5.7473,
779
+ "step": 1000
780
+ },
781
+ {
782
+ "epoch": 20.82,
783
+ "eval_accuracy": 0.16172566383651218,
784
+ "eval_loss": 5.947088718414307,
785
+ "eval_runtime": 128.1904,
786
+ "eval_samples_per_second": 187.65,
787
+ "eval_steps_per_second": 5.866,
788
+ "step": 1000
789
+ },
790
+ {
791
+ "epoch": 21.04,
792
+ "learning_rate": 0.0003610223575018795,
793
+ "loss": 6.0948,
794
+ "step": 1010
795
+ },
796
+ {
797
+ "epoch": 21.25,
798
+ "learning_rate": 0.00036023427248921215,
799
+ "loss": 5.6776,
800
+ "step": 1020
801
+ },
802
+ {
803
+ "epoch": 21.45,
804
+ "learning_rate": 0.0003594391783103792,
805
+ "loss": 5.6479,
806
+ "step": 1030
807
+ },
808
+ {
809
+ "epoch": 21.66,
810
+ "learning_rate": 0.00035863710974537563,
811
+ "loss": 5.6245,
812
+ "step": 1040
813
+ },
814
+ {
815
+ "epoch": 21.86,
816
+ "learning_rate": 0.00035782810187927875,
817
+ "loss": 5.5787,
818
+ "step": 1050
819
+ },
820
+ {
821
+ "epoch": 21.86,
822
+ "eval_accuracy": 0.18910485199927482,
823
+ "eval_loss": 5.7037835121154785,
824
+ "eval_runtime": 128.2758,
825
+ "eval_samples_per_second": 187.526,
826
+ "eval_steps_per_second": 5.862,
827
+ "step": 1050
828
+ },
829
+ {
830
+ "epoch": 22.08,
831
+ "learning_rate": 0.0003570121901007136,
832
+ "loss": 5.8678,
833
+ "step": 1060
834
+ },
835
+ {
836
+ "epoch": 22.29,
837
+ "learning_rate": 0.0003561894101003044,
838
+ "loss": 5.4587,
839
+ "step": 1070
840
+ },
841
+ {
842
+ "epoch": 22.49,
843
+ "learning_rate": 0.00035535979786911396,
844
+ "loss": 5.3982,
845
+ "step": 1080
846
+ },
847
+ {
848
+ "epoch": 22.7,
849
+ "learning_rate": 0.00035452338969706876,
850
+ "loss": 5.305,
851
+ "step": 1090
852
+ },
853
+ {
854
+ "epoch": 22.9,
855
+ "learning_rate": 0.00035368022217137184,
856
+ "loss": 5.2316,
857
+ "step": 1100
858
+ },
859
+ {
860
+ "epoch": 22.9,
861
+ "eval_accuracy": 0.23819718901662149,
862
+ "eval_loss": 5.270751476287842,
863
+ "eval_runtime": 128.1695,
864
+ "eval_samples_per_second": 187.681,
865
+ "eval_steps_per_second": 5.867,
866
+ "step": 1100
867
+ },
868
+ {
869
+ "epoch": 23.12,
870
+ "learning_rate": 0.00035283033217490227,
871
+ "loss": 5.4202,
872
+ "step": 1110
873
+ },
874
+ {
875
+ "epoch": 23.33,
876
+ "learning_rate": 0.00035197375688460176,
877
+ "loss": 4.9911,
878
+ "step": 1120
879
+ },
880
+ {
881
+ "epoch": 23.53,
882
+ "learning_rate": 0.0003511105337698484,
883
+ "loss": 4.8741,
884
+ "step": 1130
885
+ },
886
+ {
887
+ "epoch": 23.74,
888
+ "learning_rate": 0.0003502407005908177,
889
+ "loss": 4.7582,
890
+ "step": 1140
891
+ },
892
+ {
893
+ "epoch": 23.95,
894
+ "learning_rate": 0.0003493642953968308,
895
+ "loss": 4.6613,
896
+ "step": 1150
897
+ },
898
+ {
899
+ "epoch": 23.95,
900
+ "eval_accuracy": 0.29748286605712254,
901
+ "eval_loss": 4.707459926605225,
902
+ "eval_runtime": 131.3634,
903
+ "eval_samples_per_second": 183.118,
904
+ "eval_steps_per_second": 5.725,
905
+ "step": 1150
906
+ },
907
+ {
908
+ "epoch": 24.16,
909
+ "learning_rate": 0.00034848135652469,
910
+ "loss": 4.8536,
911
+ "step": 1160
912
+ },
913
+ {
914
+ "epoch": 24.37,
915
+ "learning_rate": 0.00034759192259700196,
916
+ "loss": 4.4822,
917
+ "step": 1170
918
+ },
919
+ {
920
+ "epoch": 24.58,
921
+ "learning_rate": 0.000346696032520488,
922
+ "loss": 4.4126,
923
+ "step": 1180
924
+ },
925
+ {
926
+ "epoch": 24.78,
927
+ "learning_rate": 0.00034579372548428235,
928
+ "loss": 4.3707,
929
+ "step": 1190
930
+ },
931
+ {
932
+ "epoch": 24.99,
933
+ "learning_rate": 0.00034488504095821784,
934
+ "loss": 4.3006,
935
+ "step": 1200
936
+ },
937
+ {
938
+ "epoch": 24.99,
939
+ "eval_accuracy": 0.3221731878424314,
940
+ "eval_loss": 4.417978763580322,
941
+ "eval_runtime": 144.7447,
942
+ "eval_samples_per_second": 166.189,
943
+ "eval_steps_per_second": 5.195,
944
+ "step": 1200
945
+ },
946
+ {
947
+ "epoch": 25.21,
948
+ "learning_rate": 0.0003439700186910993,
949
+ "loss": 4.5185,
950
+ "step": 1210
951
+ },
952
+ {
953
+ "epoch": 25.41,
954
+ "learning_rate": 0.00034304869870896513,
955
+ "loss": 4.2011,
956
+ "step": 1220
957
+ },
958
+ {
959
+ "epoch": 25.62,
960
+ "learning_rate": 0.00034212112131333587,
961
+ "loss": 4.1513,
962
+ "step": 1230
963
+ },
964
+ {
965
+ "epoch": 25.82,
966
+ "learning_rate": 0.0003411873270794518,
967
+ "loss": 4.1584,
968
+ "step": 1240
969
+ },
970
+ {
971
+ "epoch": 26.04,
972
+ "learning_rate": 0.00034024735685449773,
973
+ "loss": 4.3754,
974
+ "step": 1250
975
+ },
976
+ {
977
+ "epoch": 26.04,
978
+ "eval_accuracy": 0.33853883405739793,
979
+ "eval_loss": 4.238345146179199,
980
+ "eval_runtime": 144.4875,
981
+ "eval_samples_per_second": 166.485,
982
+ "eval_steps_per_second": 5.205,
983
+ "step": 1250
984
+ },
985
+ {
986
+ "epoch": 26.25,
987
+ "learning_rate": 0.00033930125175581647,
988
+ "loss": 4.0477,
989
+ "step": 1260
990
+ },
991
+ {
992
+ "epoch": 26.45,
993
+ "learning_rate": 0.0003383490531691099,
994
+ "loss": 4.0339,
995
+ "step": 1270
996
+ },
997
+ {
998
+ "epoch": 26.66,
999
+ "learning_rate": 0.0003373908027466289,
1000
+ "loss": 4.0184,
1001
+ "step": 1280
1002
+ },
1003
+ {
1004
+ "epoch": 26.86,
1005
+ "learning_rate": 0.00033642654240535134,
1006
+ "loss": 3.9835,
1007
+ "step": 1290
1008
+ },
1009
+ {
1010
+ "epoch": 27.08,
1011
+ "learning_rate": 0.00033545631432514825,
1012
+ "loss": 4.2531,
1013
+ "step": 1300
1014
+ },
1015
+ {
1016
+ "epoch": 27.08,
1017
+ "eval_accuracy": 0.34910193403738843,
1018
+ "eval_loss": 4.1157379150390625,
1019
+ "eval_runtime": 144.9052,
1020
+ "eval_samples_per_second": 166.005,
1021
+ "eval_steps_per_second": 5.19,
1022
+ "step": 1300
1023
+ },
1024
+ {
1025
+ "epoch": 27.29,
1026
+ "learning_rate": 0.00033448016094693895,
1027
+ "loss": 3.9085,
1028
+ "step": 1310
1029
+ },
1030
+ {
1031
+ "epoch": 27.49,
1032
+ "learning_rate": 0.0003334981249708345,
1033
+ "loss": 3.9205,
1034
+ "step": 1320
1035
+ },
1036
+ {
1037
+ "epoch": 27.7,
1038
+ "learning_rate": 0.00033251024935427,
1039
+ "loss": 3.8786,
1040
+ "step": 1330
1041
+ },
1042
+ {
1043
+ "epoch": 27.9,
1044
+ "learning_rate": 0.0003315165773101249,
1045
+ "loss": 3.8839,
1046
+ "step": 1340
1047
+ },
1048
+ {
1049
+ "epoch": 28.12,
1050
+ "learning_rate": 0.00033051715230483374,
1051
+ "loss": 4.0987,
1052
+ "step": 1350
1053
+ },
1054
+ {
1055
+ "epoch": 28.12,
1056
+ "eval_accuracy": 0.3577664889956034,
1057
+ "eval_loss": 4.0197062492370605,
1058
+ "eval_runtime": 128.1918,
1059
+ "eval_samples_per_second": 187.649,
1060
+ "eval_steps_per_second": 5.866,
1061
+ "step": 1350
1062
+ },
1063
+ {
1064
+ "epoch": 28.33,
1065
+ "learning_rate": 0.0003295120180564838,
1066
+ "loss": 3.8043,
1067
+ "step": 1360
1068
+ },
1069
+ {
1070
+ "epoch": 28.53,
1071
+ "learning_rate": 0.00032850121853290334,
1072
+ "loss": 3.8048,
1073
+ "step": 1370
1074
+ },
1075
+ {
1076
+ "epoch": 28.74,
1077
+ "learning_rate": 0.000327484797949738,
1078
+ "loss": 3.8127,
1079
+ "step": 1380
1080
+ },
1081
+ {
1082
+ "epoch": 28.95,
1083
+ "learning_rate": 0.00032646280076851684,
1084
+ "loss": 3.7773,
1085
+ "step": 1390
1086
+ },
1087
+ {
1088
+ "epoch": 29.16,
1089
+ "learning_rate": 0.0003254352716947074,
1090
+ "loss": 4.0045,
1091
+ "step": 1400
1092
+ },
1093
+ {
1094
+ "epoch": 29.16,
1095
+ "eval_accuracy": 0.3656404849021274,
1096
+ "eval_loss": 3.9503591060638428,
1097
+ "eval_runtime": 128.0514,
1098
+ "eval_samples_per_second": 187.854,
1099
+ "eval_steps_per_second": 5.873,
1100
+ "step": 1400
1101
+ },
1102
+ {
1103
+ "epoch": 29.37,
1104
+ "learning_rate": 0.0003244022556757602,
1105
+ "loss": 3.7454,
1106
+ "step": 1410
1107
+ },
1108
+ {
1109
+ "epoch": 29.58,
1110
+ "learning_rate": 0.0003233637978991422,
1111
+ "loss": 3.7151,
1112
+ "step": 1420
1113
+ },
1114
+ {
1115
+ "epoch": 29.78,
1116
+ "learning_rate": 0.00032231994379036086,
1117
+ "loss": 3.7243,
1118
+ "step": 1430
1119
+ },
1120
+ {
1121
+ "epoch": 29.99,
1122
+ "learning_rate": 0.0003212707390109765,
1123
+ "loss": 3.691,
1124
+ "step": 1440
1125
+ },
1126
+ {
1127
+ "epoch": 30.21,
1128
+ "learning_rate": 0.00032021622945660504,
1129
+ "loss": 3.9145,
1130
+ "step": 1450
1131
+ },
1132
+ {
1133
+ "epoch": 30.21,
1134
+ "eval_accuracy": 0.37178706581714294,
1135
+ "eval_loss": 3.8818981647491455,
1136
+ "eval_runtime": 128.0667,
1137
+ "eval_samples_per_second": 187.832,
1138
+ "eval_steps_per_second": 5.872,
1139
+ "step": 1450
1140
+ },
1141
+ {
1142
+ "epoch": 30.41,
1143
+ "learning_rate": 0.0003191564612549106,
1144
+ "loss": 3.6485,
1145
+ "step": 1460
1146
+ },
1147
+ {
1148
+ "epoch": 30.62,
1149
+ "learning_rate": 0.0003180914807635874,
1150
+ "loss": 3.6517,
1151
+ "step": 1470
1152
+ },
1153
+ {
1154
+ "epoch": 30.82,
1155
+ "learning_rate": 0.00031702133456833236,
1156
+ "loss": 3.6453,
1157
+ "step": 1480
1158
+ },
1159
+ {
1160
+ "epoch": 31.04,
1161
+ "learning_rate": 0.00031594606948080663,
1162
+ "loss": 3.8657,
1163
+ "step": 1490
1164
+ },
1165
+ {
1166
+ "epoch": 31.25,
1167
+ "learning_rate": 0.00031486573253658874,
1168
+ "loss": 3.5808,
1169
+ "step": 1500
1170
+ },
1171
+ {
1172
+ "epoch": 31.25,
1173
+ "eval_accuracy": 0.3780561563488311,
1174
+ "eval_loss": 3.827902317047119,
1175
+ "eval_runtime": 128.1177,
1176
+ "eval_samples_per_second": 187.757,
1177
+ "eval_steps_per_second": 5.87,
1178
+ "step": 1500
1179
+ },
1180
+ {
1181
+ "epoch": 31.45,
1182
+ "learning_rate": 0.00031378037099311627,
1183
+ "loss": 3.5983,
1184
+ "step": 1510
1185
+ },
1186
+ {
1187
+ "epoch": 31.66,
1188
+ "learning_rate": 0.00031269003232761933,
1189
+ "loss": 3.5947,
1190
+ "step": 1520
1191
+ },
1192
+ {
1193
+ "epoch": 31.86,
1194
+ "learning_rate": 0.0003115947642350433,
1195
+ "loss": 3.5773,
1196
+ "step": 1530
1197
+ },
1198
+ {
1199
+ "epoch": 32.08,
1200
+ "learning_rate": 0.00031049461462596267,
1201
+ "loss": 3.7941,
1202
+ "step": 1540
1203
+ },
1204
+ {
1205
+ "epoch": 32.29,
1206
+ "learning_rate": 0.00030938963162448544,
1207
+ "loss": 3.5354,
1208
+ "step": 1550
1209
+ },
1210
+ {
1211
+ "epoch": 32.29,
1212
+ "eval_accuracy": 0.3825753450782098,
1213
+ "eval_loss": 3.7829582691192627,
1214
+ "eval_runtime": 128.1881,
1215
+ "eval_samples_per_second": 187.654,
1216
+ "eval_steps_per_second": 5.866,
1217
+ "step": 1550
1218
+ },
1219
+ {
1220
+ "epoch": 32.49,
1221
+ "learning_rate": 0.0003082798635661476,
1222
+ "loss": 3.5355,
1223
+ "step": 1560
1224
+ },
1225
+ {
1226
+ "epoch": 32.7,
1227
+ "learning_rate": 0.0003071653589957993,
1228
+ "loss": 3.5166,
1229
+ "step": 1570
1230
+ },
1231
+ {
1232
+ "epoch": 32.9,
1233
+ "learning_rate": 0.000306046166665481,
1234
+ "loss": 3.5295,
1235
+ "step": 1580
1236
+ },
1237
+ {
1238
+ "epoch": 33.12,
1239
+ "learning_rate": 0.00030492233553229076,
1240
+ "loss": 3.7281,
1241
+ "step": 1590
1242
+ },
1243
+ {
1244
+ "epoch": 33.33,
1245
+ "learning_rate": 0.00030379391475624304,
1246
+ "loss": 3.4788,
1247
+ "step": 1600
1248
+ },
1249
+ {
1250
+ "epoch": 33.33,
1251
+ "eval_accuracy": 0.3872195585541312,
1252
+ "eval_loss": 3.7400357723236084,
1253
+ "eval_runtime": 128.395,
1254
+ "eval_samples_per_second": 187.352,
1255
+ "eval_steps_per_second": 5.857,
1256
+ "step": 1600
1257
+ },
1258
+ {
1259
+ "epoch": 33.53,
1260
+ "learning_rate": 0.0003026609536981183,
1261
+ "loss": 3.4722,
1262
+ "step": 1610
1263
+ },
1264
+ {
1265
+ "epoch": 33.74,
1266
+ "learning_rate": 0.0003015235019173034,
1267
+ "loss": 3.4888,
1268
+ "step": 1620
1269
+ },
1270
+ {
1271
+ "epoch": 33.95,
1272
+ "learning_rate": 0.00030038160916962404,
1273
+ "loss": 3.472,
1274
+ "step": 1630
1275
+ },
1276
+ {
1277
+ "epoch": 34.16,
1278
+ "learning_rate": 0.00029923532540516843,
1279
+ "loss": 3.6802,
1280
+ "step": 1640
1281
+ },
1282
+ {
1283
+ "epoch": 34.37,
1284
+ "learning_rate": 0.00029808470076610167,
1285
+ "loss": 3.4315,
1286
+ "step": 1650
1287
+ },
1288
+ {
1289
+ "epoch": 34.37,
1290
+ "eval_accuracy": 0.39113526163842305,
1291
+ "eval_loss": 3.702760696411133,
1292
+ "eval_runtime": 147.8935,
1293
+ "eval_samples_per_second": 162.651,
1294
+ "eval_steps_per_second": 5.085,
1295
+ "step": 1650
1296
+ },
1297
+ {
1298
+ "epoch": 34.58,
1299
+ "learning_rate": 0.00029692978558447305,
1300
+ "loss": 3.4258,
1301
+ "step": 1660
1302
+ },
1303
+ {
1304
+ "epoch": 34.78,
1305
+ "learning_rate": 0.0002957706303800139,
1306
+ "loss": 3.4325,
1307
+ "step": 1670
1308
+ },
1309
+ {
1310
+ "epoch": 34.99,
1311
+ "learning_rate": 0.0002946072858579282,
1312
+ "loss": 3.418,
1313
+ "step": 1680
1314
+ },
1315
+ {
1316
+ "epoch": 35.21,
1317
+ "learning_rate": 0.0002934398029066739,
1318
+ "loss": 3.6162,
1319
+ "step": 1690
1320
+ },
1321
+ {
1322
+ "epoch": 35.41,
1323
+ "learning_rate": 0.0002922682325957376,
1324
+ "loss": 3.3906,
1325
+ "step": 1700
1326
+ },
1327
+ {
1328
+ "epoch": 35.41,
1329
+ "eval_accuracy": 0.3955544302044244,
1330
+ "eval_loss": 3.6628527641296387,
1331
+ "eval_runtime": 147.4398,
1332
+ "eval_samples_per_second": 163.151,
1333
+ "eval_steps_per_second": 5.1,
1334
+ "step": 1700
1335
+ },
1336
+ {
1337
+ "epoch": 35.62,
1338
+ "learning_rate": 0.00029109262617339987,
1339
+ "loss": 3.3731,
1340
+ "step": 1710
1341
+ },
1342
+ {
1343
+ "epoch": 35.82,
1344
+ "learning_rate": 0.0002899130350644941,
1345
+ "loss": 3.381,
1346
+ "step": 1720
1347
+ },
1348
+ {
1349
+ "epoch": 36.04,
1350
+ "learning_rate": 0.00028872951086815685,
1351
+ "loss": 3.608,
1352
+ "step": 1730
1353
+ },
1354
+ {
1355
+ "epoch": 36.25,
1356
+ "learning_rate": 0.00028754210535557036,
1357
+ "loss": 3.3345,
1358
+ "step": 1740
1359
+ },
1360
+ {
1361
+ "epoch": 36.45,
1362
+ "learning_rate": 0.00028635087046769857,
1363
+ "loss": 3.3508,
1364
+ "step": 1750
1365
+ },
1366
+ {
1367
+ "epoch": 36.45,
1368
+ "eval_accuracy": 0.3984451691466028,
1369
+ "eval_loss": 3.6344361305236816,
1370
+ "eval_runtime": 147.716,
1371
+ "eval_samples_per_second": 162.846,
1372
+ "eval_steps_per_second": 5.091,
1373
+ "step": 1750
1374
+ },
1375
+ {
1376
+ "epoch": 36.66,
1377
+ "learning_rate": 0.00028515585831301456,
1378
+ "loss": 3.3503,
1379
+ "step": 1760
1380
+ },
1381
+ {
1382
+ "epoch": 36.86,
1383
+ "learning_rate": 0.0002839571211652212,
1384
+ "loss": 3.3494,
1385
+ "step": 1770
1386
+ },
1387
+ {
1388
+ "epoch": 37.08,
1389
+ "learning_rate": 0.00028275471146096466,
1390
+ "loss": 3.539,
1391
+ "step": 1780
1392
+ },
1393
+ {
1394
+ "epoch": 37.29,
1395
+ "learning_rate": 0.00028154868179754074,
1396
+ "loss": 3.3145,
1397
+ "step": 1790
1398
+ },
1399
+ {
1400
+ "epoch": 37.49,
1401
+ "learning_rate": 0.0002803390849305939,
1402
+ "loss": 3.288,
1403
+ "step": 1800
1404
+ },
1405
+ {
1406
+ "epoch": 37.49,
1407
+ "eval_accuracy": 0.4019129197606658,
1408
+ "eval_loss": 3.6045737266540527,
1409
+ "eval_runtime": 128.0819,
1410
+ "eval_samples_per_second": 187.81,
1411
+ "eval_steps_per_second": 5.871,
1412
+ "step": 1800
1413
+ },
1414
+ {
1415
+ "epoch": 37.7,
1416
+ "learning_rate": 0.0002791259737718097,
1417
+ "loss": 3.318,
1418
+ "step": 1810
1419
+ },
1420
+ {
1421
+ "epoch": 37.9,
1422
+ "learning_rate": 0.0002779094013866001,
1423
+ "loss": 3.3005,
1424
+ "step": 1820
1425
+ },
1426
+ {
1427
+ "epoch": 38.12,
1428
+ "learning_rate": 0.00027668942099178234,
1429
+ "loss": 3.4959,
1430
+ "step": 1830
1431
+ },
1432
+ {
1433
+ "epoch": 38.33,
1434
+ "learning_rate": 0.00027546608595325117,
1435
+ "loss": 3.2771,
1436
+ "step": 1840
1437
+ },
1438
+ {
1439
+ "epoch": 38.53,
1440
+ "learning_rate": 0.00027423944978364416,
1441
+ "loss": 3.2678,
1442
+ "step": 1850
1443
+ },
1444
+ {
1445
+ "epoch": 38.53,
1446
+ "eval_accuracy": 0.40528409278500616,
1447
+ "eval_loss": 3.5798938274383545,
1448
+ "eval_runtime": 128.148,
1449
+ "eval_samples_per_second": 187.713,
1450
+ "eval_steps_per_second": 5.868,
1451
+ "step": 1850
1452
+ },
1453
+ {
1454
+ "epoch": 38.74,
1455
+ "learning_rate": 0.00027300956614000115,
1456
+ "loss": 3.2707,
1457
+ "step": 1860
1458
+ },
1459
+ {
1460
+ "epoch": 38.95,
1461
+ "learning_rate": 0.00027177648882141704,
1462
+ "loss": 3.276,
1463
+ "step": 1870
1464
+ },
1465
+ {
1466
+ "epoch": 39.16,
1467
+ "learning_rate": 0.0002705402717666883,
1468
+ "loss": 3.4633,
1469
+ "step": 1880
1470
+ },
1471
+ {
1472
+ "epoch": 39.37,
1473
+ "learning_rate": 0.00026930096905195363,
1474
+ "loss": 3.2392,
1475
+ "step": 1890
1476
+ },
1477
+ {
1478
+ "epoch": 39.58,
1479
+ "learning_rate": 0.00026805863488832865,
1480
+ "loss": 3.2382,
1481
+ "step": 1900
1482
+ },
1483
+ {
1484
+ "epoch": 39.58,
1485
+ "eval_accuracy": 0.40741369917263087,
1486
+ "eval_loss": 3.5548534393310547,
1487
+ "eval_runtime": 128.15,
1488
+ "eval_samples_per_second": 187.71,
1489
+ "eval_steps_per_second": 5.868,
1490
+ "step": 1900
1491
+ },
1492
+ {
1493
+ "epoch": 39.78,
1494
+ "learning_rate": 0.00026681332361953424,
1495
+ "loss": 3.2304,
1496
+ "step": 1910
1497
+ },
1498
+ {
1499
+ "epoch": 39.99,
1500
+ "learning_rate": 0.0002655650897195195,
1501
+ "loss": 3.24,
1502
+ "step": 1920
1503
+ },
1504
+ {
1505
+ "epoch": 40.21,
1506
+ "learning_rate": 0.0002643139877900791,
1507
+ "loss": 3.4143,
1508
+ "step": 1930
1509
+ },
1510
+ {
1511
+ "epoch": 40.41,
1512
+ "learning_rate": 0.00026306007255846436,
1513
+ "loss": 3.203,
1514
+ "step": 1940
1515
+ },
1516
+ {
1517
+ "epoch": 40.62,
1518
+ "learning_rate": 0.00026180339887498953,
1519
+ "loss": 3.2151,
1520
+ "step": 1950
1521
+ },
1522
+ {
1523
+ "epoch": 40.62,
1524
+ "eval_accuracy": 0.41034103588846577,
1525
+ "eval_loss": 3.5284957885742188,
1526
+ "eval_runtime": 128.1661,
1527
+ "eval_samples_per_second": 187.686,
1528
+ "eval_steps_per_second": 5.867,
1529
+ "step": 1950
1530
+ },
1531
+ {
1532
+ "epoch": 40.82,
1533
+ "learning_rate": 0.00026054402171063267,
1534
+ "loss": 3.2063,
1535
+ "step": 1960
1536
+ },
1537
+ {
1538
+ "epoch": 41.04,
1539
+ "learning_rate": 0.0002592819961546308,
1540
+ "loss": 3.4173,
1541
+ "step": 1970
1542
+ },
1543
+ {
1544
+ "epoch": 41.25,
1545
+ "learning_rate": 0.00025801737741207005,
1546
+ "loss": 3.1796,
1547
+ "step": 1980
1548
+ },
1549
+ {
1550
+ "epoch": 41.45,
1551
+ "learning_rate": 0.000256750220801471,
1552
+ "loss": 3.1799,
1553
+ "step": 1990
1554
+ },
1555
+ {
1556
+ "epoch": 41.66,
1557
+ "learning_rate": 0.0002554805817523689,
1558
+ "loss": 3.1777,
1559
+ "step": 2000
1560
+ },
1561
+ {
1562
+ "epoch": 41.66,
1563
+ "eval_accuracy": 0.41320715004942615,
1564
+ "eval_loss": 3.506920337677002,
1565
+ "eval_runtime": 128.1475,
1566
+ "eval_samples_per_second": 187.713,
1567
+ "eval_steps_per_second": 5.868,
1568
+ "step": 2000
1569
+ },
1570
+ {
1571
+ "epoch": 41.86,
1572
+ "learning_rate": 0.0002542085158028889,
1573
+ "loss": 3.1791,
1574
+ "step": 2010
1575
+ },
1576
+ {
1577
+ "epoch": 42.08,
1578
+ "learning_rate": 0.00025293407859731633,
1579
+ "loss": 3.363,
1580
+ "step": 2020
1581
+ },
1582
+ {
1583
+ "epoch": 42.29,
1584
+ "learning_rate": 0.00025165732588366334,
1585
+ "loss": 3.1381,
1586
+ "step": 2030
1587
+ },
1588
+ {
1589
+ "epoch": 42.49,
1590
+ "learning_rate": 0.00025037831351122967,
1591
+ "loss": 3.1556,
1592
+ "step": 2040
1593
+ },
1594
+ {
1595
+ "epoch": 42.7,
1596
+ "learning_rate": 0.0002490970974281599,
1597
+ "loss": 3.1499,
1598
+ "step": 2050
1599
+ },
1600
+ {
1601
+ "epoch": 42.7,
1602
+ "eval_accuracy": 0.4150287828947368,
1603
+ "eval_loss": 3.491703987121582,
1604
+ "eval_runtime": 128.345,
1605
+ "eval_samples_per_second": 187.425,
1606
+ "eval_steps_per_second": 5.859,
1607
+ "step": 2050
1608
+ },
1609
+ {
1610
+ "epoch": 42.9,
1611
+ "learning_rate": 0.00024781373367899597,
1612
+ "loss": 3.1374,
1613
+ "step": 2060
1614
+ },
1615
+ {
1616
+ "epoch": 43.12,
1617
+ "learning_rate": 0.00024652827840222606,
1618
+ "loss": 3.3333,
1619
+ "step": 2070
1620
+ },
1621
+ {
1622
+ "epoch": 43.33,
1623
+ "learning_rate": 0.00024524078782782807,
1624
+ "loss": 3.1338,
1625
+ "step": 2080
1626
+ },
1627
+ {
1628
+ "epoch": 43.53,
1629
+ "learning_rate": 0.00024395131827481062,
1630
+ "loss": 3.1092,
1631
+ "step": 2090
1632
+ },
1633
+ {
1634
+ "epoch": 43.74,
1635
+ "learning_rate": 0.0002426599261487494,
1636
+ "loss": 3.131,
1637
+ "step": 2100
1638
+ },
1639
+ {
1640
+ "epoch": 43.74,
1641
+ "eval_accuracy": 0.4168157298218596,
1642
+ "eval_loss": 3.4700751304626465,
1643
+ "eval_runtime": 128.1928,
1644
+ "eval_samples_per_second": 187.647,
1645
+ "eval_steps_per_second": 5.866,
1646
+ "step": 2100
1647
+ },
1648
+ {
1649
+ "epoch": 43.95,
1650
+ "learning_rate": 0.00024136666793931935,
1651
+ "loss": 3.1197,
1652
+ "step": 2110
1653
+ },
1654
+ {
1655
+ "epoch": 44.16,
1656
+ "learning_rate": 0.00024007160021782427,
1657
+ "loss": 3.3001,
1658
+ "step": 2120
1659
+ },
1660
+ {
1661
+ "epoch": 44.37,
1662
+ "learning_rate": 0.0002387747796347217,
1663
+ "loss": 3.0929,
1664
+ "step": 2130
1665
+ },
1666
+ {
1667
+ "epoch": 44.58,
1668
+ "learning_rate": 0.00023747626291714498,
1669
+ "loss": 3.0968,
1670
+ "step": 2140
1671
+ },
1672
+ {
1673
+ "epoch": 44.78,
1674
+ "learning_rate": 0.000236176106866422,
1675
+ "loss": 3.0942,
1676
+ "step": 2150
1677
+ },
1678
+ {
1679
+ "epoch": 44.78,
1680
+ "eval_accuracy": 0.4189149215354626,
1681
+ "eval_loss": 3.4530041217803955,
1682
+ "eval_runtime": 128.3801,
1683
+ "eval_samples_per_second": 187.373,
1684
+ "eval_steps_per_second": 5.858,
1685
+ "step": 2150
1686
+ },
1687
+ {
1688
+ "epoch": 44.99,
1689
+ "learning_rate": 0.00023487436835559035,
1690
+ "loss": 3.1072,
1691
+ "step": 2160
1692
+ },
1693
+ {
1694
+ "epoch": 45.21,
1695
+ "learning_rate": 0.00023357110432690954,
1696
+ "loss": 3.268,
1697
+ "step": 2170
1698
+ },
1699
+ {
1700
+ "epoch": 45.41,
1701
+ "learning_rate": 0.00023226637178937022,
1702
+ "loss": 3.0772,
1703
+ "step": 2180
1704
+ },
1705
+ {
1706
+ "epoch": 45.62,
1707
+ "learning_rate": 0.00023096022781620034,
1708
+ "loss": 3.071,
1709
+ "step": 2190
1710
+ },
1711
+ {
1712
+ "epoch": 45.82,
1713
+ "learning_rate": 0.0002296527295423684,
1714
+ "loss": 3.0683,
1715
+ "step": 2200
1716
+ },
1717
+ {
1718
+ "epoch": 45.82,
1719
+ "eval_accuracy": 0.42115190069686975,
1720
+ "eval_loss": 3.4319911003112793,
1721
+ "eval_runtime": 128.2347,
1722
+ "eval_samples_per_second": 187.586,
1723
+ "eval_steps_per_second": 5.864,
1724
+ "step": 2200
1725
+ },
1726
+ {
1727
+ "epoch": 46.04,
1728
+ "learning_rate": 0.00022834393416208486,
1729
+ "loss": 3.2606,
1730
+ "step": 2210
1731
+ },
1732
+ {
1733
+ "epoch": 46.25,
1734
+ "learning_rate": 0.0002270338989262994,
1735
+ "loss": 3.0464,
1736
+ "step": 2220
1737
+ },
1738
+ {
1739
+ "epoch": 46.45,
1740
+ "learning_rate": 0.00022572268114019726,
1741
+ "loss": 3.0424,
1742
+ "step": 2230
1743
+ },
1744
+ {
1745
+ "epoch": 46.66,
1746
+ "learning_rate": 0.00022441033816069202,
1747
+ "loss": 3.0469,
1748
+ "step": 2240
1749
+ },
1750
+ {
1751
+ "epoch": 46.86,
1752
+ "learning_rate": 0.00022309692739391727,
1753
+ "loss": 3.0363,
1754
+ "step": 2250
1755
+ },
1756
+ {
1757
+ "epoch": 46.86,
1758
+ "eval_accuracy": 0.42269065003604217,
1759
+ "eval_loss": 3.419463872909546,
1760
+ "eval_runtime": 128.2052,
1761
+ "eval_samples_per_second": 187.629,
1762
+ "eval_steps_per_second": 5.866,
1763
+ "step": 2250
1764
+ },
1765
+ {
1766
+ "epoch": 47.08,
1767
+ "learning_rate": 0.00022178250629271452,
1768
+ "loss": 3.2579,
1769
+ "step": 2260
1770
+ },
1771
+ {
1772
+ "epoch": 47.29,
1773
+ "learning_rate": 0.00022046713235412103,
1774
+ "loss": 3.0223,
1775
+ "step": 2270
1776
+ },
1777
+ {
1778
+ "epoch": 47.49,
1779
+ "learning_rate": 0.00021915086311685404,
1780
+ "loss": 3.0431,
1781
+ "step": 2280
1782
+ },
1783
+ {
1784
+ "epoch": 47.7,
1785
+ "learning_rate": 0.00021783375615879415,
1786
+ "loss": 3.0339,
1787
+ "step": 2290
1788
+ },
1789
+ {
1790
+ "epoch": 47.9,
1791
+ "learning_rate": 0.0002165158690944665,
1792
+ "loss": 3.0264,
1793
+ "step": 2300
1794
+ },
1795
+ {
1796
+ "epoch": 47.9,
1797
+ "eval_accuracy": 0.4248503269779865,
1798
+ "eval_loss": 3.4046127796173096,
1799
+ "eval_runtime": 128.0991,
1800
+ "eval_samples_per_second": 187.784,
1801
+ "eval_steps_per_second": 5.87,
1802
+ "step": 2300
1803
+ },
1804
+ {
1805
+ "epoch": 48.12,
1806
+ "learning_rate": 0.00021519725957252063,
1807
+ "loss": 3.2189,
1808
+ "step": 2310
1809
+ },
1810
+ {
1811
+ "epoch": 48.33,
1812
+ "learning_rate": 0.00021387798527320882,
1813
+ "loss": 3.0121,
1814
+ "step": 2320
1815
+ },
1816
+ {
1817
+ "epoch": 48.53,
1818
+ "learning_rate": 0.0002125581039058627,
1819
+ "loss": 3.0031,
1820
+ "step": 2330
1821
+ },
1822
+ {
1823
+ "epoch": 48.74,
1824
+ "learning_rate": 0.0002112376732063691,
1825
+ "loss": 2.9933,
1826
+ "step": 2340
1827
+ },
1828
+ {
1829
+ "epoch": 48.95,
1830
+ "learning_rate": 0.00020991675093464448,
1831
+ "loss": 3.0079,
1832
+ "step": 2350
1833
+ },
1834
+ {
1835
+ "epoch": 48.95,
1836
+ "eval_accuracy": 0.4266670495134685,
1837
+ "eval_loss": 3.3874006271362305,
1838
+ "eval_runtime": 128.1597,
1839
+ "eval_samples_per_second": 187.696,
1840
+ "eval_steps_per_second": 5.868,
1841
+ "step": 2350
1842
+ },
1843
+ {
1844
+ "epoch": 49.16,
1845
+ "learning_rate": 0.00020859539487210813,
1846
+ "loss": 3.2002,
1847
+ "step": 2360
1848
+ },
1849
+ {
1850
+ "epoch": 49.37,
1851
+ "learning_rate": 0.0002072736628191549,
1852
+ "loss": 2.9868,
1853
+ "step": 2370
1854
+ },
1855
+ {
1856
+ "epoch": 49.58,
1857
+ "learning_rate": 0.0002059516125926265,
1858
+ "loss": 2.9996,
1859
+ "step": 2380
1860
+ },
1861
+ {
1862
+ "epoch": 49.78,
1863
+ "learning_rate": 0.00020462930202328278,
1864
+ "loss": 2.9719,
1865
+ "step": 2390
1866
+ },
1867
+ {
1868
+ "epoch": 49.99,
1869
+ "learning_rate": 0.00020330678895327174,
1870
+ "loss": 2.9869,
1871
+ "step": 2400
1872
+ },
1873
+ {
1874
+ "epoch": 49.99,
1875
+ "eval_accuracy": 0.42770797209455824,
1876
+ "eval_loss": 3.3792383670806885,
1877
+ "eval_runtime": 128.195,
1878
+ "eval_samples_per_second": 187.644,
1879
+ "eval_steps_per_second": 5.866,
1880
+ "step": 2400
1881
+ },
1882
+ {
1883
+ "epoch": 50.21,
1884
+ "learning_rate": 0.00020198413123359926,
1885
+ "loss": 3.1735,
1886
+ "step": 2410
1887
+ },
1888
+ {
1889
+ "epoch": 50.41,
1890
+ "learning_rate": 0.00020066138672159903,
1891
+ "loss": 2.9707,
1892
+ "step": 2420
1893
+ },
1894
+ {
1895
+ "epoch": 50.62,
1896
+ "learning_rate": 0.00019933861327840098,
1897
+ "loss": 2.9682,
1898
+ "step": 2430
1899
+ },
1900
+ {
1901
+ "epoch": 50.82,
1902
+ "learning_rate": 0.00019801586876640073,
1903
+ "loss": 2.9752,
1904
+ "step": 2440
1905
+ },
1906
+ {
1907
+ "epoch": 51.04,
1908
+ "learning_rate": 0.0001966932110467283,
1909
+ "loss": 3.1592,
1910
+ "step": 2450
1911
+ },
1912
+ {
1913
+ "epoch": 51.04,
1914
+ "eval_accuracy": 0.4289155778595229,
1915
+ "eval_loss": 3.3654892444610596,
1916
+ "eval_runtime": 128.18,
1917
+ "eval_samples_per_second": 187.666,
1918
+ "eval_steps_per_second": 5.867,
1919
+ "step": 2450
1920
+ },
1921
+ {
1922
+ "epoch": 51.25,
1923
+ "learning_rate": 0.00019537069797671724,
1924
+ "loss": 2.948,
1925
+ "step": 2460
1926
+ },
1927
+ {
1928
+ "epoch": 51.45,
1929
+ "learning_rate": 0.0001940483874073735,
1930
+ "loss": 2.9368,
1931
+ "step": 2470
1932
+ },
1933
+ {
1934
+ "epoch": 51.66,
1935
+ "learning_rate": 0.00019272633718084517,
1936
+ "loss": 2.9466,
1937
+ "step": 2480
1938
+ },
1939
+ {
1940
+ "epoch": 51.86,
1941
+ "learning_rate": 0.0001914046051278919,
1942
+ "loss": 2.9541,
1943
+ "step": 2490
1944
+ },
1945
+ {
1946
+ "epoch": 52.08,
1947
+ "learning_rate": 0.00019008324906535554,
1948
+ "loss": 3.1353,
1949
+ "step": 2500
1950
+ },
1951
+ {
1952
+ "epoch": 52.08,
1953
+ "eval_accuracy": 0.43104300088517533,
1954
+ "eval_loss": 3.3548085689544678,
1955
+ "eval_runtime": 128.179,
1956
+ "eval_samples_per_second": 187.667,
1957
+ "eval_steps_per_second": 5.867,
1958
+ "step": 2500
1959
+ },
1960
+ {
1961
+ "epoch": 52.29,
1962
+ "learning_rate": 0.0001887623267936309,
1963
+ "loss": 2.9264,
1964
+ "step": 2510
1965
+ },
1966
+ {
1967
+ "epoch": 52.49,
1968
+ "learning_rate": 0.00018744189609413734,
1969
+ "loss": 2.9325,
1970
+ "step": 2520
1971
+ },
1972
+ {
1973
+ "epoch": 52.7,
1974
+ "learning_rate": 0.0001861220147267912,
1975
+ "loss": 2.9263,
1976
+ "step": 2530
1977
+ },
1978
+ {
1979
+ "epoch": 52.9,
1980
+ "learning_rate": 0.0001848027404274794,
1981
+ "loss": 2.9275,
1982
+ "step": 2540
1983
+ },
1984
+ {
1985
+ "epoch": 53.12,
1986
+ "learning_rate": 0.00018348413090553354,
1987
+ "loss": 3.1257,
1988
+ "step": 2550
1989
+ },
1990
+ {
1991
+ "epoch": 53.12,
1992
+ "eval_accuracy": 0.43083924373522625,
1993
+ "eval_loss": 3.348921775817871,
1994
+ "eval_runtime": 128.2056,
1995
+ "eval_samples_per_second": 187.628,
1996
+ "eval_steps_per_second": 5.866,
1997
+ "step": 2550
1998
+ },
1999
+ {
2000
+ "epoch": 53.33,
2001
+ "learning_rate": 0.00018216624384120595,
2002
+ "loss": 2.9018,
2003
+ "step": 2560
2004
+ },
2005
+ {
2006
+ "epoch": 53.53,
2007
+ "learning_rate": 0.00018084913688314597,
2008
+ "loss": 2.9135,
2009
+ "step": 2570
2010
+ },
2011
+ {
2012
+ "epoch": 53.74,
2013
+ "learning_rate": 0.000179532867645879,
2014
+ "loss": 2.9067,
2015
+ "step": 2580
2016
+ },
2017
+ {
2018
+ "epoch": 53.95,
2019
+ "learning_rate": 0.0001782174937072855,
2020
+ "loss": 2.9146,
2021
+ "step": 2590
2022
+ },
2023
+ {
2024
+ "epoch": 54.16,
2025
+ "learning_rate": 0.00017690307260608278,
2026
+ "loss": 3.0822,
2027
+ "step": 2600
2028
+ },
2029
+ {
2030
+ "epoch": 54.16,
2031
+ "eval_accuracy": 0.4326622137249495,
2032
+ "eval_loss": 3.3352506160736084,
2033
+ "eval_runtime": 128.2781,
2034
+ "eval_samples_per_second": 187.522,
2035
+ "eval_steps_per_second": 5.862,
2036
+ "step": 2600
2037
+ },
2038
+ {
2039
+ "epoch": 54.37,
2040
+ "learning_rate": 0.000175589661839308,
2041
+ "loss": 2.8995,
2042
+ "step": 2610
2043
+ },
2044
+ {
2045
+ "epoch": 54.58,
2046
+ "learning_rate": 0.00017427731885980282,
2047
+ "loss": 2.8945,
2048
+ "step": 2620
2049
+ },
2050
+ {
2051
+ "epoch": 54.78,
2052
+ "learning_rate": 0.0001729661010737007,
2053
+ "loss": 2.905,
2054
+ "step": 2630
2055
+ },
2056
+ {
2057
+ "epoch": 54.99,
2058
+ "learning_rate": 0.00017165606583791515,
2059
+ "loss": 2.9128,
2060
+ "step": 2640
2061
+ },
2062
+ {
2063
+ "epoch": 55.21,
2064
+ "learning_rate": 0.00017034727045763158,
2065
+ "loss": 3.0771,
2066
+ "step": 2650
2067
+ },
2068
+ {
2069
+ "epoch": 55.21,
2070
+ "eval_accuracy": 0.434098312415683,
2071
+ "eval_loss": 3.3219847679138184,
2072
+ "eval_runtime": 128.3146,
2073
+ "eval_samples_per_second": 187.469,
2074
+ "eval_steps_per_second": 5.861,
2075
+ "step": 2650
2076
+ },
2077
+ {
2078
+ "epoch": 55.41,
2079
+ "learning_rate": 0.00016903977218379974,
2080
+ "loss": 2.8695,
2081
+ "step": 2660
2082
+ },
2083
+ {
2084
+ "epoch": 55.62,
2085
+ "learning_rate": 0.00016773362821062983,
2086
+ "loss": 2.8839,
2087
+ "step": 2670
2088
+ },
2089
+ {
2090
+ "epoch": 55.82,
2091
+ "learning_rate": 0.00016642889567309048,
2092
+ "loss": 2.8887,
2093
+ "step": 2680
2094
+ },
2095
+ {
2096
+ "epoch": 56.04,
2097
+ "learning_rate": 0.0001651256316444097,
2098
+ "loss": 3.0754,
2099
+ "step": 2690
2100
+ },
2101
+ {
2102
+ "epoch": 56.25,
2103
+ "learning_rate": 0.0001638238931335781,
2104
+ "loss": 2.8639,
2105
+ "step": 2700
2106
+ },
2107
+ {
2108
+ "epoch": 56.25,
2109
+ "eval_accuracy": 0.4353990105725288,
2110
+ "eval_loss": 3.3119492530822754,
2111
+ "eval_runtime": 128.0745,
2112
+ "eval_samples_per_second": 187.82,
2113
+ "eval_steps_per_second": 5.872,
2114
+ "step": 2700
2115
+ },
2116
+ {
2117
+ "epoch": 56.45,
2118
+ "learning_rate": 0.00016252373708285504,
2119
+ "loss": 2.8653,
2120
+ "step": 2710
2121
+ },
2122
+ {
2123
+ "epoch": 56.66,
2124
+ "learning_rate": 0.00016122522036527838,
2125
+ "loss": 2.8696,
2126
+ "step": 2720
2127
+ },
2128
+ {
2129
+ "epoch": 56.86,
2130
+ "learning_rate": 0.00015992839978217578,
2131
+ "loss": 2.8665,
2132
+ "step": 2730
2133
+ },
2134
+ {
2135
+ "epoch": 57.08,
2136
+ "learning_rate": 0.00015863333206068067,
2137
+ "loss": 3.0651,
2138
+ "step": 2740
2139
+ },
2140
+ {
2141
+ "epoch": 57.29,
2142
+ "learning_rate": 0.00015734007385125067,
2143
+ "loss": 2.8477,
2144
+ "step": 2750
2145
+ },
2146
+ {
2147
+ "epoch": 57.29,
2148
+ "eval_accuracy": 0.4360402472560164,
2149
+ "eval_loss": 3.310389280319214,
2150
+ "eval_runtime": 128.2649,
2151
+ "eval_samples_per_second": 187.542,
2152
+ "eval_steps_per_second": 5.863,
2153
+ "step": 2750
2154
+ },
2155
+ {
2156
+ "epoch": 57.49,
2157
+ "learning_rate": 0.0001560486817251894,
2158
+ "loss": 2.8511,
2159
+ "step": 2760
2160
+ },
2161
+ {
2162
+ "epoch": 57.7,
2163
+ "learning_rate": 0.000154759212172172,
2164
+ "loss": 2.8615,
2165
+ "step": 2770
2166
+ },
2167
+ {
2168
+ "epoch": 57.9,
2169
+ "learning_rate": 0.00015347172159777396,
2170
+ "loss": 2.8619,
2171
+ "step": 2780
2172
+ },
2173
+ {
2174
+ "epoch": 58.12,
2175
+ "learning_rate": 0.000152186266321004,
2176
+ "loss": 3.0316,
2177
+ "step": 2790
2178
+ },
2179
+ {
2180
+ "epoch": 58.33,
2181
+ "learning_rate": 0.0001509029025718402,
2182
+ "loss": 2.8373,
2183
+ "step": 2800
2184
+ },
2185
+ {
2186
+ "epoch": 58.33,
2187
+ "eval_accuracy": 0.4378144877232535,
2188
+ "eval_loss": 3.295414686203003,
2189
+ "eval_runtime": 128.0673,
2190
+ "eval_samples_per_second": 187.831,
2191
+ "eval_steps_per_second": 5.872,
2192
+ "step": 2800
2193
+ },
2194
+ {
2195
+ "epoch": 58.53,
2196
+ "learning_rate": 0.0001496216864887704,
2197
+ "loss": 2.8292,
2198
+ "step": 2810
2199
+ },
2200
+ {
2201
+ "epoch": 58.74,
2202
+ "learning_rate": 0.00014834267411633674,
2203
+ "loss": 2.8361,
2204
+ "step": 2820
2205
+ },
2206
+ {
2207
+ "epoch": 58.95,
2208
+ "learning_rate": 0.0001470659214026837,
2209
+ "loss": 2.8417,
2210
+ "step": 2830
2211
+ },
2212
+ {
2213
+ "epoch": 59.16,
2214
+ "learning_rate": 0.00014579148419711119,
2215
+ "loss": 3.0263,
2216
+ "step": 2840
2217
+ },
2218
+ {
2219
+ "epoch": 59.37,
2220
+ "learning_rate": 0.00014451941824763113,
2221
+ "loss": 2.818,
2222
+ "step": 2850
2223
+ },
2224
+ {
2225
+ "epoch": 59.37,
2226
+ "eval_accuracy": 0.43805501144654146,
2227
+ "eval_loss": 3.2935194969177246,
2228
+ "eval_runtime": 128.0232,
2229
+ "eval_samples_per_second": 187.896,
2230
+ "eval_steps_per_second": 5.874,
2231
+ "step": 2850
2232
+ },
2233
+ {
2234
+ "epoch": 59.58,
2235
+ "learning_rate": 0.000143249779198529,
2236
+ "loss": 2.8253,
2237
+ "step": 2860
2238
+ },
2239
+ {
2240
+ "epoch": 59.78,
2241
+ "learning_rate": 0.00014198262258793002,
2242
+ "loss": 2.8424,
2243
+ "step": 2870
2244
+ },
2245
+ {
2246
+ "epoch": 59.99,
2247
+ "learning_rate": 0.00014071800384536927,
2248
+ "loss": 2.8335,
2249
+ "step": 2880
2250
+ },
2251
+ {
2252
+ "epoch": 60.21,
2253
+ "learning_rate": 0.00013945597828936737,
2254
+ "loss": 2.9887,
2255
+ "step": 2890
2256
+ },
2257
+ {
2258
+ "epoch": 60.41,
2259
+ "learning_rate": 0.00013819660112501054,
2260
+ "loss": 2.8137,
2261
+ "step": 2900
2262
+ },
2263
+ {
2264
+ "epoch": 60.41,
2265
+ "eval_accuracy": 0.4394361605808428,
2266
+ "eval_loss": 3.278566598892212,
2267
+ "eval_runtime": 128.078,
2268
+ "eval_samples_per_second": 187.815,
2269
+ "eval_steps_per_second": 5.871,
2270
+ "step": 2900
2271
+ },
2272
+ {
2273
+ "epoch": 60.62,
2274
+ "learning_rate": 0.00013693992744153572,
2275
+ "loss": 2.8271,
2276
+ "step": 2910
2277
+ },
2278
+ {
2279
+ "epoch": 60.82,
2280
+ "learning_rate": 0.00013568601220992097,
2281
+ "loss": 2.8286,
2282
+ "step": 2920
2283
+ },
2284
+ {
2285
+ "epoch": 61.04,
2286
+ "learning_rate": 0.00013443491028048045,
2287
+ "loss": 3.0135,
2288
+ "step": 2930
2289
+ },
2290
+ {
2291
+ "epoch": 61.25,
2292
+ "learning_rate": 0.0001331866763804658,
2293
+ "loss": 2.8038,
2294
+ "step": 2940
2295
+ },
2296
+ {
2297
+ "epoch": 61.45,
2298
+ "learning_rate": 0.0001319413651116714,
2299
+ "loss": 2.7985,
2300
+ "step": 2950
2301
+ },
2302
+ {
2303
+ "epoch": 61.45,
2304
+ "eval_accuracy": 0.4401244630436134,
2305
+ "eval_loss": 3.2746615409851074,
2306
+ "eval_runtime": 128.0922,
2307
+ "eval_samples_per_second": 187.794,
2308
+ "eval_steps_per_second": 5.871,
2309
+ "step": 2950
2310
+ },
2311
+ {
2312
+ "epoch": 61.66,
2313
+ "learning_rate": 0.00013069903094804644,
2314
+ "loss": 2.7993,
2315
+ "step": 2960
2316
+ },
2317
+ {
2318
+ "epoch": 61.86,
2319
+ "learning_rate": 0.0001294597282333118,
2320
+ "loss": 2.8132,
2321
+ "step": 2970
2322
+ },
2323
+ {
2324
+ "epoch": 62.08,
2325
+ "learning_rate": 0.00012822351117858303,
2326
+ "loss": 2.9785,
2327
+ "step": 2980
2328
+ },
2329
+ {
2330
+ "epoch": 62.29,
2331
+ "learning_rate": 0.0001269904338599989,
2332
+ "loss": 2.7959,
2333
+ "step": 2990
2334
+ },
2335
+ {
2336
+ "epoch": 62.49,
2337
+ "learning_rate": 0.0001257605502163558,
2338
+ "loss": 2.7936,
2339
+ "step": 3000
2340
+ },
2341
+ {
2342
+ "epoch": 62.49,
2343
+ "eval_accuracy": 0.44108544914689357,
2344
+ "eval_loss": 3.266845941543579,
2345
+ "eval_runtime": 128.1403,
2346
+ "eval_samples_per_second": 187.724,
2347
+ "eval_steps_per_second": 5.869,
2348
+ "step": 3000
2349
+ },
2350
+ {
2351
+ "epoch": 62.7,
2352
+ "learning_rate": 0.00012453391404674885,
2353
+ "loss": 2.7904,
2354
+ "step": 3010
2355
+ },
2356
+ {
2357
+ "epoch": 62.9,
2358
+ "learning_rate": 0.00012331057900821768,
2359
+ "loss": 2.7934,
2360
+ "step": 3020
2361
+ },
2362
+ {
2363
+ "epoch": 63.12,
2364
+ "learning_rate": 0.0001220905986134,
2365
+ "loss": 2.9571,
2366
+ "step": 3030
2367
+ },
2368
+ {
2369
+ "epoch": 63.33,
2370
+ "learning_rate": 0.00012087402622819039,
2371
+ "loss": 2.7925,
2372
+ "step": 3040
2373
+ },
2374
+ {
2375
+ "epoch": 63.53,
2376
+ "learning_rate": 0.00011966091506940616,
2377
+ "loss": 2.7764,
2378
+ "step": 3050
2379
+ },
2380
+ {
2381
+ "epoch": 63.53,
2382
+ "eval_accuracy": 0.441903341927903,
2383
+ "eval_loss": 3.256887197494507,
2384
+ "eval_runtime": 128.1402,
2385
+ "eval_samples_per_second": 187.724,
2386
+ "eval_steps_per_second": 5.869,
2387
+ "step": 3050
2388
+ },
2389
+ {
2390
+ "epoch": 63.74,
2391
+ "learning_rate": 0.00011845131820245934,
2392
+ "loss": 2.7851,
2393
+ "step": 3060
2394
+ },
2395
+ {
2396
+ "epoch": 63.95,
2397
+ "learning_rate": 0.00011724528853903536,
2398
+ "loss": 2.7837,
2399
+ "step": 3070
2400
+ },
2401
+ {
2402
+ "epoch": 64.16,
2403
+ "learning_rate": 0.00011604287883477889,
2404
+ "loss": 2.9344,
2405
+ "step": 3080
2406
+ },
2407
+ {
2408
+ "epoch": 64.37,
2409
+ "learning_rate": 0.00011484414168698547,
2410
+ "loss": 2.7703,
2411
+ "step": 3090
2412
+ },
2413
+ {
2414
+ "epoch": 64.58,
2415
+ "learning_rate": 0.00011364912953230145,
2416
+ "loss": 2.7819,
2417
+ "step": 3100
2418
+ },
2419
+ {
2420
+ "epoch": 64.58,
2421
+ "eval_accuracy": 0.44339571520227505,
2422
+ "eval_loss": 3.2492308616638184,
2423
+ "eval_runtime": 128.0362,
2424
+ "eval_samples_per_second": 187.877,
2425
+ "eval_steps_per_second": 5.873,
2426
+ "step": 3100
2427
+ },
2428
+ {
2429
+ "epoch": 64.78,
2430
+ "learning_rate": 0.00011245789464442964,
2431
+ "loss": 2.7841,
2432
+ "step": 3110
2433
+ },
2434
+ {
2435
+ "epoch": 64.99,
2436
+ "learning_rate": 0.00011127048913184326,
2437
+ "loss": 2.7794,
2438
+ "step": 3120
2439
+ },
2440
+ {
2441
+ "epoch": 65.21,
2442
+ "learning_rate": 0.00011008696493550599,
2443
+ "loss": 2.9422,
2444
+ "step": 3130
2445
+ },
2446
+ {
2447
+ "epoch": 65.41,
2448
+ "learning_rate": 0.00010890737382660015,
2449
+ "loss": 2.7573,
2450
+ "step": 3140
2451
+ },
2452
+ {
2453
+ "epoch": 65.62,
2454
+ "learning_rate": 0.00010773176740426248,
2455
+ "loss": 2.7672,
2456
+ "step": 3150
2457
+ },
2458
+ {
2459
+ "epoch": 65.62,
2460
+ "eval_accuracy": 0.4433201371393935,
2461
+ "eval_loss": 3.2493698596954346,
2462
+ "eval_runtime": 128.0217,
2463
+ "eval_samples_per_second": 187.898,
2464
+ "eval_steps_per_second": 5.874,
2465
+ "step": 3150
2466
+ },
2467
+ {
2468
+ "epoch": 65.82,
2469
+ "learning_rate": 0.00010656019709332606,
2470
+ "loss": 2.7557,
2471
+ "step": 3160
2472
+ },
2473
+ {
2474
+ "epoch": 66.04,
2475
+ "learning_rate": 0.00010539271414207186,
2476
+ "loss": 2.9353,
2477
+ "step": 3170
2478
+ },
2479
+ {
2480
+ "epoch": 66.25,
2481
+ "learning_rate": 0.00010422936961998609,
2482
+ "loss": 2.7494,
2483
+ "step": 3180
2484
+ },
2485
+ {
2486
+ "epoch": 66.45,
2487
+ "learning_rate": 0.00010307021441552707,
2488
+ "loss": 2.7401,
2489
+ "step": 3190
2490
+ },
2491
+ {
2492
+ "epoch": 66.66,
2493
+ "learning_rate": 0.00010191529923389845,
2494
+ "loss": 2.7629,
2495
+ "step": 3200
2496
+ },
2497
+ {
2498
+ "epoch": 66.66,
2499
+ "eval_accuracy": 0.44430680533611233,
2500
+ "eval_loss": 3.240968704223633,
2501
+ "eval_runtime": 128.0927,
2502
+ "eval_samples_per_second": 187.794,
2503
+ "eval_steps_per_second": 5.871,
2504
+ "step": 3200
2505
+ },
2506
+ {
2507
+ "epoch": 66.86,
2508
+ "learning_rate": 0.00010076467459483155,
2509
+ "loss": 2.7537,
2510
+ "step": 3210
2511
+ },
2512
+ {
2513
+ "epoch": 67.08,
2514
+ "learning_rate": 9.961839083037592e-05,
2515
+ "loss": 2.9359,
2516
+ "step": 3220
2517
+ },
2518
+ {
2519
+ "epoch": 67.29,
2520
+ "learning_rate": 9.847649808269658e-05,
2521
+ "loss": 2.7575,
2522
+ "step": 3230
2523
+ },
2524
+ {
2525
+ "epoch": 67.49,
2526
+ "learning_rate": 9.733904630188176e-05,
2527
+ "loss": 2.7294,
2528
+ "step": 3240
2529
+ },
2530
+ {
2531
+ "epoch": 67.7,
2532
+ "learning_rate": 9.620608524375703e-05,
2533
+ "loss": 2.747,
2534
+ "step": 3250
2535
+ },
2536
+ {
2537
+ "epoch": 67.7,
2538
+ "eval_accuracy": 0.4446199448310505,
2539
+ "eval_loss": 3.236819267272949,
2540
+ "eval_runtime": 127.9991,
2541
+ "eval_samples_per_second": 187.931,
2542
+ "eval_steps_per_second": 5.875,
2543
+ "step": 3250
2544
+ },
2545
+ {
2546
+ "epoch": 67.9,
2547
+ "learning_rate": 9.507766446770934e-05,
2548
+ "loss": 2.7458,
2549
+ "step": 3260
2550
+ },
2551
+ {
2552
+ "epoch": 68.12,
2553
+ "learning_rate": 9.39538333345191e-05,
2554
+ "loss": 2.9246,
2555
+ "step": 3270
2556
+ },
2557
+ {
2558
+ "epoch": 68.33,
2559
+ "learning_rate": 9.283464100420063e-05,
2560
+ "loss": 2.741,
2561
+ "step": 3280
2562
+ },
2563
+ {
2564
+ "epoch": 68.53,
2565
+ "learning_rate": 9.17201364338524e-05,
2566
+ "loss": 2.7421,
2567
+ "step": 3290
2568
+ },
2569
+ {
2570
+ "epoch": 68.74,
2571
+ "learning_rate": 9.061036837551466e-05,
2572
+ "loss": 2.7303,
2573
+ "step": 3300
2574
+ },
2575
+ {
2576
+ "epoch": 68.74,
2577
+ "eval_accuracy": 0.44596990309042184,
2578
+ "eval_loss": 3.224606990814209,
2579
+ "eval_runtime": 128.0175,
2580
+ "eval_samples_per_second": 187.904,
2581
+ "eval_steps_per_second": 5.874,
2582
+ "step": 3300
2583
+ },
2584
+ {
2585
+ "epoch": 68.95,
2586
+ "learning_rate": 8.950538537403736e-05,
2587
+ "loss": 2.7291,
2588
+ "step": 3310
2589
+ },
2590
+ {
2591
+ "epoch": 69.16,
2592
+ "learning_rate": 8.840523576495681e-05,
2593
+ "loss": 2.903,
2594
+ "step": 3320
2595
+ },
2596
+ {
2597
+ "epoch": 69.37,
2598
+ "learning_rate": 8.730996767238072e-05,
2599
+ "loss": 2.7319,
2600
+ "step": 3330
2601
+ },
2602
+ {
2603
+ "epoch": 69.58,
2604
+ "learning_rate": 8.621962900688378e-05,
2605
+ "loss": 2.7166,
2606
+ "step": 3340
2607
+ },
2608
+ {
2609
+ "epoch": 69.78,
2610
+ "learning_rate": 8.513426746341128e-05,
2611
+ "loss": 2.7461,
2612
+ "step": 3350
2613
+ },
2614
+ {
2615
+ "epoch": 69.78,
2616
+ "eval_accuracy": 0.44624793300809595,
2617
+ "eval_loss": 3.2212436199188232,
2618
+ "eval_runtime": 128.0443,
2619
+ "eval_samples_per_second": 187.865,
2620
+ "eval_steps_per_second": 5.873,
2621
+ "step": 3350
2622
+ },
2623
+ {
2624
+ "epoch": 69.99,
2625
+ "learning_rate": 8.405393051919333e-05,
2626
+ "loss": 2.7214,
2627
+ "step": 3360
2628
+ },
2629
+ {
2630
+ "epoch": 70.21,
2631
+ "learning_rate": 8.29786654316677e-05,
2632
+ "loss": 2.8969,
2633
+ "step": 3370
2634
+ },
2635
+ {
2636
+ "epoch": 70.41,
2637
+ "learning_rate": 8.190851923641259e-05,
2638
+ "loss": 2.6964,
2639
+ "step": 3380
2640
+ },
2641
+ {
2642
+ "epoch": 70.62,
2643
+ "learning_rate": 8.084353874508947e-05,
2644
+ "loss": 2.7295,
2645
+ "step": 3390
2646
+ },
2647
+ {
2648
+ "epoch": 70.82,
2649
+ "learning_rate": 7.978377054339499e-05,
2650
+ "loss": 2.7179,
2651
+ "step": 3400
2652
+ },
2653
+ {
2654
+ "epoch": 70.82,
2655
+ "eval_accuracy": 0.4470331759822518,
2656
+ "eval_loss": 3.221658706665039,
2657
+ "eval_runtime": 127.9471,
2658
+ "eval_samples_per_second": 188.007,
2659
+ "eval_steps_per_second": 5.877,
2660
+ "step": 3400
2661
+ },
2662
+ {
2663
+ "epoch": 71.04,
2664
+ "learning_rate": 7.872926098902358e-05,
2665
+ "loss": 2.9027,
2666
+ "step": 3410
2667
+ },
2668
+ {
2669
+ "epoch": 71.25,
2670
+ "learning_rate": 7.768005620963916e-05,
2671
+ "loss": 2.7053,
2672
+ "step": 3420
2673
+ },
2674
+ {
2675
+ "epoch": 71.45,
2676
+ "learning_rate": 7.663620210085781e-05,
2677
+ "loss": 2.709,
2678
+ "step": 3430
2679
+ },
2680
+ {
2681
+ "epoch": 71.66,
2682
+ "learning_rate": 7.55977443242399e-05,
2683
+ "loss": 2.7125,
2684
+ "step": 3440
2685
+ },
2686
+ {
2687
+ "epoch": 71.86,
2688
+ "learning_rate": 7.456472830529259e-05,
2689
+ "loss": 2.7184,
2690
+ "step": 3450
2691
+ },
2692
+ {
2693
+ "epoch": 71.86,
2694
+ "eval_accuracy": 0.44788752382659924,
2695
+ "eval_loss": 3.213238000869751,
2696
+ "eval_runtime": 127.9515,
2697
+ "eval_samples_per_second": 188.001,
2698
+ "eval_steps_per_second": 5.877,
2699
+ "step": 3450
2700
+ },
2701
+ {
2702
+ "epoch": 72.08,
2703
+ "learning_rate": 7.353719923148324e-05,
2704
+ "loss": 2.8953,
2705
+ "step": 3460
2706
+ },
2707
+ {
2708
+ "epoch": 72.29,
2709
+ "learning_rate": 7.251520205026205e-05,
2710
+ "loss": 2.6971,
2711
+ "step": 3470
2712
+ },
2713
+ {
2714
+ "epoch": 72.49,
2715
+ "learning_rate": 7.149878146709676e-05,
2716
+ "loss": 2.6983,
2717
+ "step": 3480
2718
+ },
2719
+ {
2720
+ "epoch": 72.7,
2721
+ "learning_rate": 7.048798194351625e-05,
2722
+ "loss": 2.7034,
2723
+ "step": 3490
2724
+ },
2725
+ {
2726
+ "epoch": 72.9,
2727
+ "learning_rate": 6.948284769516627e-05,
2728
+ "loss": 2.7077,
2729
+ "step": 3500
2730
+ },
2731
+ {
2732
+ "epoch": 72.9,
2733
+ "eval_accuracy": 0.44867082596467595,
2734
+ "eval_loss": 3.208606243133545,
2735
+ "eval_runtime": 128.1176,
2736
+ "eval_samples_per_second": 187.757,
2737
+ "eval_steps_per_second": 5.87,
2738
+ "step": 3500
2739
+ },
2740
+ {
2741
+ "epoch": 73.12,
2742
+ "learning_rate": 6.848342268987511e-05,
2743
+ "loss": 2.8784,
2744
+ "step": 3510
2745
+ },
2746
+ {
2747
+ "epoch": 73.33,
2748
+ "learning_rate": 6.748975064573007e-05,
2749
+ "loss": 2.694,
2750
+ "step": 3520
2751
+ },
2752
+ {
2753
+ "epoch": 73.53,
2754
+ "learning_rate": 6.650187502916552e-05,
2755
+ "loss": 2.6991,
2756
+ "step": 3530
2757
+ },
2758
+ {
2759
+ "epoch": 73.74,
2760
+ "learning_rate": 6.551983905306107e-05,
2761
+ "loss": 2.7075,
2762
+ "step": 3540
2763
+ },
2764
+ {
2765
+ "epoch": 73.95,
2766
+ "learning_rate": 6.454368567485183e-05,
2767
+ "loss": 2.6916,
2768
+ "step": 3550
2769
+ },
2770
+ {
2771
+ "epoch": 73.95,
2772
+ "eval_accuracy": 0.44818311301861347,
2773
+ "eval_loss": 3.2057085037231445,
2774
+ "eval_runtime": 128.0769,
2775
+ "eval_samples_per_second": 187.817,
2776
+ "eval_steps_per_second": 5.871,
2777
+ "step": 3550
2778
+ },
2779
+ {
2780
+ "epoch": 74.16,
2781
+ "learning_rate": 6.35734575946487e-05,
2782
+ "loss": 2.884,
2783
+ "step": 3560
2784
+ },
2785
+ {
2786
+ "epoch": 74.37,
2787
+ "learning_rate": 6.260919725337109e-05,
2788
+ "loss": 2.6885,
2789
+ "step": 3570
2790
+ },
2791
+ {
2792
+ "epoch": 74.58,
2793
+ "learning_rate": 6.165094683089015e-05,
2794
+ "loss": 2.7009,
2795
+ "step": 3580
2796
+ },
2797
+ {
2798
+ "epoch": 74.78,
2799
+ "learning_rate": 6.069874824418356e-05,
2800
+ "loss": 2.6924,
2801
+ "step": 3590
2802
+ },
2803
+ {
2804
+ "epoch": 74.99,
2805
+ "learning_rate": 5.975264314550229e-05,
2806
+ "loss": 2.6934,
2807
+ "step": 3600
2808
+ },
2809
+ {
2810
+ "epoch": 74.99,
2811
+ "eval_accuracy": 0.44951231576252726,
2812
+ "eval_loss": 3.201040506362915,
2813
+ "eval_runtime": 128.0178,
2814
+ "eval_samples_per_second": 187.904,
2815
+ "eval_steps_per_second": 5.874,
2816
+ "step": 3600
2817
+ },
2818
+ {
2819
+ "epoch": 75.21,
2820
+ "learning_rate": 5.881267292054828e-05,
2821
+ "loss": 2.8607,
2822
+ "step": 3610
2823
+ },
2824
+ {
2825
+ "epoch": 75.41,
2826
+ "learning_rate": 5.787887868666417e-05,
2827
+ "loss": 2.678,
2828
+ "step": 3620
2829
+ },
2830
+ {
2831
+ "epoch": 75.62,
2832
+ "learning_rate": 5.6951301291034945e-05,
2833
+ "loss": 2.696,
2834
+ "step": 3630
2835
+ },
2836
+ {
2837
+ "epoch": 75.82,
2838
+ "learning_rate": 5.602998130890065e-05,
2839
+ "loss": 2.6944,
2840
+ "step": 3640
2841
+ },
2842
+ {
2843
+ "epoch": 76.04,
2844
+ "learning_rate": 5.511495904178221e-05,
2845
+ "loss": 2.8585,
2846
+ "step": 3650
2847
+ },
2848
+ {
2849
+ "epoch": 76.04,
2850
+ "eval_accuracy": 0.44973287373334114,
2851
+ "eval_loss": 3.1979689598083496,
2852
+ "eval_runtime": 127.9186,
2853
+ "eval_samples_per_second": 188.049,
2854
+ "eval_steps_per_second": 5.879,
2855
+ "step": 3650
2856
+ },
2857
+ {
2858
+ "epoch": 76.25,
2859
+ "learning_rate": 5.4206274515717736e-05,
2860
+ "loss": 2.6924,
2861
+ "step": 3660
2862
+ },
2863
+ {
2864
+ "epoch": 76.45,
2865
+ "learning_rate": 5.330396747951205e-05,
2866
+ "loss": 2.6796,
2867
+ "step": 3670
2868
+ },
2869
+ {
2870
+ "epoch": 76.66,
2871
+ "learning_rate": 5.240807740299811e-05,
2872
+ "loss": 2.684,
2873
+ "step": 3680
2874
+ },
2875
+ {
2876
+ "epoch": 76.86,
2877
+ "learning_rate": 5.1518643475310034e-05,
2878
+ "loss": 2.6842,
2879
+ "step": 3690
2880
+ },
2881
+ {
2882
+ "epoch": 77.08,
2883
+ "learning_rate": 5.0635704603169287e-05,
2884
+ "loss": 2.8559,
2885
+ "step": 3700
2886
+ },
2887
+ {
2888
+ "epoch": 77.08,
2889
+ "eval_accuracy": 0.4502848474176814,
2890
+ "eval_loss": 3.1939539909362793,
2891
+ "eval_runtime": 128.1047,
2892
+ "eval_samples_per_second": 187.776,
2893
+ "eval_steps_per_second": 5.87,
2894
+ "step": 3700
2895
+ },
2896
+ {
2897
+ "epoch": 77.29,
2898
+ "learning_rate": 4.975929940918236e-05,
2899
+ "loss": 2.6777,
2900
+ "step": 3710
2901
+ },
2902
+ {
2903
+ "epoch": 77.49,
2904
+ "learning_rate": 4.8889466230151646e-05,
2905
+ "loss": 2.6673,
2906
+ "step": 3720
2907
+ },
2908
+ {
2909
+ "epoch": 77.7,
2910
+ "learning_rate": 4.8026243115398314e-05,
2911
+ "loss": 2.6694,
2912
+ "step": 3730
2913
+ },
2914
+ {
2915
+ "epoch": 77.9,
2916
+ "learning_rate": 4.7169667825097775e-05,
2917
+ "loss": 2.6734,
2918
+ "step": 3740
2919
+ },
2920
+ {
2921
+ "epoch": 78.12,
2922
+ "learning_rate": 4.631977782862824e-05,
2923
+ "loss": 2.8519,
2924
+ "step": 3750
2925
+ },
2926
+ {
2927
+ "epoch": 78.12,
2928
+ "eval_accuracy": 0.4506198615318044,
2929
+ "eval_loss": 3.1939969062805176,
2930
+ "eval_runtime": 128.0925,
2931
+ "eval_samples_per_second": 187.794,
2932
+ "eval_steps_per_second": 5.871,
2933
+ "step": 3750
2934
+ },
2935
+ {
2936
+ "epoch": 78.33,
2937
+ "learning_rate": 4.547661030293129e-05,
2938
+ "loss": 2.6742,
2939
+ "step": 3760
2940
+ },
2941
+ {
2942
+ "epoch": 78.53,
2943
+ "learning_rate": 4.464020213088611e-05,
2944
+ "loss": 2.6767,
2945
+ "step": 3770
2946
+ },
2947
+ {
2948
+ "epoch": 78.74,
2949
+ "learning_rate": 4.381058989969564e-05,
2950
+ "loss": 2.6641,
2951
+ "step": 3780
2952
+ },
2953
+ {
2954
+ "epoch": 78.95,
2955
+ "learning_rate": 4.298780989928646e-05,
2956
+ "loss": 2.6726,
2957
+ "step": 3790
2958
+ },
2959
+ {
2960
+ "epoch": 79.16,
2961
+ "learning_rate": 4.217189812072131e-05,
2962
+ "loss": 2.8391,
2963
+ "step": 3800
2964
+ },
2965
+ {
2966
+ "epoch": 79.16,
2967
+ "eval_accuracy": 0.4509423513828209,
2968
+ "eval_loss": 3.1897408962249756,
2969
+ "eval_runtime": 127.8217,
2970
+ "eval_samples_per_second": 188.192,
2971
+ "eval_steps_per_second": 5.883,
2972
+ "step": 3800
2973
+ },
2974
+ {
2975
+ "epoch": 79.37,
2976
+ "learning_rate": 4.136289025462443e-05,
2977
+ "loss": 2.6616,
2978
+ "step": 3810
2979
+ },
2980
+ {
2981
+ "epoch": 79.58,
2982
+ "learning_rate": 4.0560821689620856e-05,
2983
+ "loss": 2.6701,
2984
+ "step": 3820
2985
+ },
2986
+ {
2987
+ "epoch": 79.78,
2988
+ "learning_rate": 3.976572751078782e-05,
2989
+ "loss": 2.6546,
2990
+ "step": 3830
2991
+ },
2992
+ {
2993
+ "epoch": 79.99,
2994
+ "learning_rate": 3.8977642498120594e-05,
2995
+ "loss": 2.6719,
2996
+ "step": 3840
2997
+ },
2998
+ {
2999
+ "epoch": 80.21,
3000
+ "learning_rate": 3.819660112501053e-05,
3001
+ "loss": 2.845,
3002
+ "step": 3850
3003
+ },
3004
+ {
3005
+ "epoch": 80.21,
3006
+ "eval_accuracy": 0.45101718878618524,
3007
+ "eval_loss": 3.1857643127441406,
3008
+ "eval_runtime": 127.9876,
3009
+ "eval_samples_per_second": 187.948,
3010
+ "eval_steps_per_second": 5.876,
3011
+ "step": 3850
3012
+ },
3013
+ {
3014
+ "epoch": 80.41,
3015
+ "learning_rate": 3.742263755673758e-05,
3016
+ "loss": 2.6657,
3017
+ "step": 3860
3018
+ },
3019
+ {
3020
+ "epoch": 80.62,
3021
+ "learning_rate": 3.6655785648975585e-05,
3022
+ "loss": 2.6601,
3023
+ "step": 3870
3024
+ },
3025
+ {
3026
+ "epoch": 80.82,
3027
+ "learning_rate": 3.589607894631111e-05,
3028
+ "loss": 2.6666,
3029
+ "step": 3880
3030
+ },
3031
+ {
3032
+ "epoch": 81.04,
3033
+ "learning_rate": 3.514355068077655e-05,
3034
+ "loss": 2.8323,
3035
+ "step": 3890
3036
+ },
3037
+ {
3038
+ "epoch": 81.25,
3039
+ "learning_rate": 3.439823377039599e-05,
3040
+ "loss": 2.6636,
3041
+ "step": 3900
3042
+ },
3043
+ {
3044
+ "epoch": 81.25,
3045
+ "eval_accuracy": 0.45183219751680725,
3046
+ "eval_loss": 3.1818630695343018,
3047
+ "eval_runtime": 128.0672,
3048
+ "eval_samples_per_second": 187.831,
3049
+ "eval_steps_per_second": 5.872,
3050
+ "step": 3900
3051
+ },
3052
+ {
3053
+ "epoch": 81.45,
3054
+ "learning_rate": 3.36601608177457e-05,
3055
+ "loss": 2.6586,
3056
+ "step": 3910
3057
+ },
3058
+ {
3059
+ "epoch": 81.66,
3060
+ "learning_rate": 3.292936410852754e-05,
3061
+ "loss": 2.6674,
3062
+ "step": 3920
3063
+ },
3064
+ {
3065
+ "epoch": 81.86,
3066
+ "learning_rate": 3.220587561015709e-05,
3067
+ "loss": 2.6689,
3068
+ "step": 3930
3069
+ },
3070
+ {
3071
+ "epoch": 82.08,
3072
+ "learning_rate": 3.148972697036507e-05,
3073
+ "loss": 2.8232,
3074
+ "step": 3940
3075
+ },
3076
+ {
3077
+ "epoch": 82.29,
3078
+ "learning_rate": 3.078094951581289e-05,
3079
+ "loss": 2.6569,
3080
+ "step": 3950
3081
+ },
3082
+ {
3083
+ "epoch": 82.29,
3084
+ "eval_accuracy": 0.4517055966540888,
3085
+ "eval_loss": 3.183380603790283,
3086
+ "eval_runtime": 128.002,
3087
+ "eval_samples_per_second": 187.927,
3088
+ "eval_steps_per_second": 5.875,
3089
+ "step": 3950
3090
+ },
3091
+ {
3092
+ "epoch": 82.49,
3093
+ "learning_rate": 3.007957425072265e-05,
3094
+ "loss": 2.6544,
3095
+ "step": 3960
3096
+ },
3097
+ {
3098
+ "epoch": 82.7,
3099
+ "learning_rate": 2.9385631855520546e-05,
3100
+ "loss": 2.6622,
3101
+ "step": 3970
3102
+ },
3103
+ {
3104
+ "epoch": 82.9,
3105
+ "learning_rate": 2.8699152685494925e-05,
3106
+ "loss": 2.6505,
3107
+ "step": 3980
3108
+ },
3109
+ {
3110
+ "epoch": 83.12,
3111
+ "learning_rate": 2.8020166769468616e-05,
3112
+ "loss": 2.8267,
3113
+ "step": 3990
3114
+ },
3115
+ {
3116
+ "epoch": 83.33,
3117
+ "learning_rate": 2.7348703808485223e-05,
3118
+ "loss": 2.647,
3119
+ "step": 4000
3120
+ },
3121
+ {
3122
+ "epoch": 83.33,
3123
+ "eval_accuracy": 0.45166349665740935,
3124
+ "eval_loss": 3.1797752380371094,
3125
+ "eval_runtime": 128.0064,
3126
+ "eval_samples_per_second": 187.92,
3127
+ "eval_steps_per_second": 5.875,
3128
+ "step": 4000
3129
+ },
3130
+ {
3131
+ "epoch": 83.53,
3132
+ "learning_rate": 2.6684793174509915e-05,
3133
+ "loss": 2.6432,
3134
+ "step": 4010
3135
+ },
3136
+ {
3137
+ "epoch": 83.74,
3138
+ "learning_rate": 2.6028463909144574e-05,
3139
+ "loss": 2.6626,
3140
+ "step": 4020
3141
+ },
3142
+ {
3143
+ "epoch": 83.95,
3144
+ "learning_rate": 2.5379744722357403e-05,
3145
+ "loss": 2.6586,
3146
+ "step": 4030
3147
+ },
3148
+ {
3149
+ "epoch": 84.16,
3150
+ "learning_rate": 2.473866399122733e-05,
3151
+ "loss": 2.8349,
3152
+ "step": 4040
3153
+ },
3154
+ {
3155
+ "epoch": 84.37,
3156
+ "learning_rate": 2.410524975870221e-05,
3157
+ "loss": 2.6665,
3158
+ "step": 4050
3159
+ },
3160
+ {
3161
+ "epoch": 84.37,
3162
+ "eval_accuracy": 0.45251206379734554,
3163
+ "eval_loss": 3.178643226623535,
3164
+ "eval_runtime": 127.9924,
3165
+ "eval_samples_per_second": 187.941,
3166
+ "eval_steps_per_second": 5.875,
3167
+ "step": 4050
3168
+ },
3169
+ {
3170
+ "epoch": 84.58,
3171
+ "learning_rate": 2.347952973237262e-05,
3172
+ "loss": 2.6462,
3173
+ "step": 4060
3174
+ },
3175
+ {
3176
+ "epoch": 84.78,
3177
+ "learning_rate": 2.286153128325954e-05,
3178
+ "loss": 2.6444,
3179
+ "step": 4070
3180
+ },
3181
+ {
3182
+ "epoch": 84.99,
3183
+ "learning_rate": 2.2251281444617257e-05,
3184
+ "loss": 2.6442,
3185
+ "step": 4080
3186
+ },
3187
+ {
3188
+ "epoch": 85.21,
3189
+ "learning_rate": 2.1648806910750575e-05,
3190
+ "loss": 2.8258,
3191
+ "step": 4090
3192
+ },
3193
+ {
3194
+ "epoch": 85.41,
3195
+ "learning_rate": 2.1054134035847307e-05,
3196
+ "loss": 2.6382,
3197
+ "step": 4100
3198
+ },
3199
+ {
3200
+ "epoch": 85.41,
3201
+ "eval_accuracy": 0.4524524692406656,
3202
+ "eval_loss": 3.173250198364258,
3203
+ "eval_runtime": 127.972,
3204
+ "eval_samples_per_second": 187.971,
3205
+ "eval_steps_per_second": 5.876,
3206
+ "step": 4100
3207
+ },
3208
+ {
3209
+ "epoch": 85.62,
3210
+ "learning_rate": 2.0467288832825583e-05,
3211
+ "loss": 2.6655,
3212
+ "step": 4110
3213
+ },
3214
+ {
3215
+ "epoch": 85.82,
3216
+ "learning_rate": 1.9888296972195587e-05,
3217
+ "loss": 2.6459,
3218
+ "step": 4120
3219
+ },
3220
+ {
3221
+ "epoch": 86.04,
3222
+ "learning_rate": 1.931718378093703e-05,
3223
+ "loss": 2.8333,
3224
+ "step": 4130
3225
+ },
3226
+ {
3227
+ "epoch": 86.25,
3228
+ "learning_rate": 1.875397424139109e-05,
3229
+ "loss": 2.6533,
3230
+ "step": 4140
3231
+ },
3232
+ {
3233
+ "epoch": 86.45,
3234
+ "learning_rate": 1.81986929901675e-05,
3235
+ "loss": 2.6346,
3236
+ "step": 4150
3237
+ },
3238
+ {
3239
+ "epoch": 86.45,
3240
+ "eval_accuracy": 0.4532207710219251,
3241
+ "eval_loss": 3.1699652671813965,
3242
+ "eval_runtime": 128.0545,
3243
+ "eval_samples_per_second": 187.85,
3244
+ "eval_steps_per_second": 5.872,
3245
+ "step": 4150
3246
+ },
3247
+ {
3248
+ "epoch": 86.66,
3249
+ "learning_rate": 1.765136431706711e-05,
3250
+ "loss": 2.6558,
3251
+ "step": 4160
3252
+ },
3253
+ {
3254
+ "epoch": 86.86,
3255
+ "learning_rate": 1.711201216401912e-05,
3256
+ "loss": 2.6422,
3257
+ "step": 4170
3258
+ },
3259
+ {
3260
+ "epoch": 87.08,
3261
+ "learning_rate": 1.6580660124034032e-05,
3262
+ "loss": 2.8243,
3263
+ "step": 4180
3264
+ },
3265
+ {
3266
+ "epoch": 87.29,
3267
+ "learning_rate": 1.605733144017132e-05,
3268
+ "loss": 2.6443,
3269
+ "step": 4190
3270
+ },
3271
+ {
3272
+ "epoch": 87.49,
3273
+ "learning_rate": 1.5542049004523053e-05,
3274
+ "loss": 2.6457,
3275
+ "step": 4200
3276
+ },
3277
+ {
3278
+ "epoch": 87.49,
3279
+ "eval_accuracy": 0.45290158205139586,
3280
+ "eval_loss": 3.1713671684265137,
3281
+ "eval_runtime": 127.871,
3282
+ "eval_samples_per_second": 188.119,
3283
+ "eval_steps_per_second": 5.881,
3284
+ "step": 4200
3285
+ },
3286
+ {
3287
+ "epoch": 87.7,
3288
+ "learning_rate": 1.503483535721224e-05,
3289
+ "loss": 2.6578,
3290
+ "step": 4210
3291
+ },
3292
+ {
3293
+ "epoch": 87.9,
3294
+ "learning_rate": 1.4535712685406921e-05,
3295
+ "loss": 2.646,
3296
+ "step": 4220
3297
+ },
3298
+ {
3299
+ "epoch": 88.12,
3300
+ "learning_rate": 1.4044702822349731e-05,
3301
+ "loss": 2.8075,
3302
+ "step": 4230
3303
+ },
3304
+ {
3305
+ "epoch": 88.33,
3306
+ "learning_rate": 1.3561827246402692e-05,
3307
+ "loss": 2.6405,
3308
+ "step": 4240
3309
+ },
3310
+ {
3311
+ "epoch": 88.53,
3312
+ "learning_rate": 1.3087107080107853e-05,
3313
+ "loss": 2.6328,
3314
+ "step": 4250
3315
+ },
3316
+ {
3317
+ "epoch": 88.53,
3318
+ "eval_accuracy": 0.45369892675477164,
3319
+ "eval_loss": 3.168638229370117,
3320
+ "eval_runtime": 128.0728,
3321
+ "eval_samples_per_second": 187.823,
3322
+ "eval_steps_per_second": 5.872,
3323
+ "step": 4250
3324
+ },
3325
+ {
3326
+ "epoch": 88.74,
3327
+ "learning_rate": 1.2620563089263093e-05,
3328
+ "loss": 2.6377,
3329
+ "step": 4260
3330
+ },
3331
+ {
3332
+ "epoch": 88.95,
3333
+ "learning_rate": 1.2162215682014012e-05,
3334
+ "loss": 2.6645,
3335
+ "step": 4270
3336
+ },
3337
+ {
3338
+ "epoch": 89.16,
3339
+ "learning_rate": 1.1712084907961053e-05,
3340
+ "loss": 2.8112,
3341
+ "step": 4280
3342
+ },
3343
+ {
3344
+ "epoch": 89.37,
3345
+ "learning_rate": 1.127019045728246e-05,
3346
+ "loss": 2.6445,
3347
+ "step": 4290
3348
+ },
3349
+ {
3350
+ "epoch": 89.58,
3351
+ "learning_rate": 1.0836551659873074e-05,
3352
+ "loss": 2.6429,
3353
+ "step": 4300
3354
+ },
3355
+ {
3356
+ "epoch": 89.58,
3357
+ "eval_accuracy": 0.4534057770055075,
3358
+ "eval_loss": 3.171478033065796,
3359
+ "eval_runtime": 128.1308,
3360
+ "eval_samples_per_second": 187.738,
3361
+ "eval_steps_per_second": 5.869,
3362
+ "step": 4300
3363
+ },
3364
+ {
3365
+ "epoch": 89.78,
3366
+ "learning_rate": 1.0411187484498652e-05,
3367
+ "loss": 2.6458,
3368
+ "step": 4310
3369
+ },
3370
+ {
3371
+ "epoch": 89.99,
3372
+ "learning_rate": 9.99411653796627e-06,
3373
+ "loss": 2.6433,
3374
+ "step": 4320
3375
+ },
3376
+ {
3377
+ "epoch": 90.21,
3378
+ "learning_rate": 9.58535706431023e-06,
3379
+ "loss": 2.8107,
3380
+ "step": 4330
3381
+ },
3382
+ {
3383
+ "epoch": 90.41,
3384
+ "learning_rate": 9.184926943994044e-06,
3385
+ "loss": 2.6428,
3386
+ "step": 4340
3387
+ },
3388
+ {
3389
+ "epoch": 90.62,
3390
+ "learning_rate": 8.792843693128471e-06,
3391
+ "loss": 2.6369,
3392
+ "step": 4350
3393
+ },
3394
+ {
3395
+ "epoch": 90.62,
3396
+ "eval_accuracy": 0.4537699711828984,
3397
+ "eval_loss": 3.1687278747558594,
3398
+ "eval_runtime": 127.8724,
3399
+ "eval_samples_per_second": 188.117,
3400
+ "eval_steps_per_second": 5.881,
3401
+ "step": 4350
3402
+ },
3403
+ {
3404
+ "epoch": 90.82,
3405
+ "learning_rate": 8.409124462705032e-06,
3406
+ "loss": 2.6513,
3407
+ "step": 4360
3408
+ },
3409
+ {
3410
+ "epoch": 91.04,
3411
+ "learning_rate": 8.033786037845992e-06,
3412
+ "loss": 2.8,
3413
+ "step": 4370
3414
+ },
3415
+ {
3416
+ "epoch": 91.25,
3417
+ "learning_rate": 7.66684483706992e-06,
3418
+ "loss": 2.6456,
3419
+ "step": 4380
3420
+ },
3421
+ {
3422
+ "epoch": 91.45,
3423
+ "learning_rate": 7.308316911573721e-06,
3424
+ "loss": 2.6429,
3425
+ "step": 4390
3426
+ },
3427
+ {
3428
+ "epoch": 91.66,
3429
+ "learning_rate": 6.958217944530287e-06,
3430
+ "loss": 2.628,
3431
+ "step": 4400
3432
+ },
3433
+ {
3434
+ "epoch": 91.66,
3435
+ "eval_accuracy": 0.4539267525748088,
3436
+ "eval_loss": 3.165127754211426,
3437
+ "eval_runtime": 128.0531,
3438
+ "eval_samples_per_second": 187.852,
3439
+ "eval_steps_per_second": 5.873,
3440
+ "step": 4400
3441
+ },
3442
+ {
3443
+ "epoch": 91.86,
3444
+ "learning_rate": 6.616563250402585e-06,
3445
+ "loss": 2.6337,
3446
+ "step": 4410
3447
+ },
3448
+ {
3449
+ "epoch": 92.08,
3450
+ "learning_rate": 6.283367774273785e-06,
3451
+ "loss": 2.8133,
3452
+ "step": 4420
3453
+ },
3454
+ {
3455
+ "epoch": 92.29,
3456
+ "learning_rate": 5.958646091193387e-06,
3457
+ "loss": 2.6318,
3458
+ "step": 4430
3459
+ },
3460
+ {
3461
+ "epoch": 92.49,
3462
+ "learning_rate": 5.642412405539798e-06,
3463
+ "loss": 2.6365,
3464
+ "step": 4440
3465
+ },
3466
+ {
3467
+ "epoch": 92.7,
3468
+ "learning_rate": 5.334680550398852e-06,
3469
+ "loss": 2.6373,
3470
+ "step": 4450
3471
+ },
3472
+ {
3473
+ "epoch": 92.7,
3474
+ "eval_accuracy": 0.4538699774217477,
3475
+ "eval_loss": 3.1659765243530273,
3476
+ "eval_runtime": 128.2044,
3477
+ "eval_samples_per_second": 187.63,
3478
+ "eval_steps_per_second": 5.866,
3479
+ "step": 4450
3480
+ },
3481
+ {
3482
+ "epoch": 92.9,
3483
+ "learning_rate": 5.0354639869588e-06,
3484
+ "loss": 2.6355,
3485
+ "step": 4460
3486
+ },
3487
+ {
3488
+ "epoch": 93.12,
3489
+ "learning_rate": 4.744775803921475e-06,
3490
+ "loss": 2.8102,
3491
+ "step": 4470
3492
+ },
3493
+ {
3494
+ "epoch": 93.33,
3495
+ "learning_rate": 4.4626287169296846e-06,
3496
+ "loss": 2.6362,
3497
+ "step": 4480
3498
+ },
3499
+ {
3500
+ "epoch": 93.53,
3501
+ "learning_rate": 4.189035068011071e-06,
3502
+ "loss": 2.6226,
3503
+ "step": 4490
3504
+ },
3505
+ {
3506
+ "epoch": 93.74,
3507
+ "learning_rate": 3.924006825038129e-06,
3508
+ "loss": 2.6357,
3509
+ "step": 4500
3510
+ },
3511
+ {
3512
+ "epoch": 93.74,
3513
+ "eval_accuracy": 0.4537227251216398,
3514
+ "eval_loss": 3.1661999225616455,
3515
+ "eval_runtime": 128.0693,
3516
+ "eval_samples_per_second": 187.828,
3517
+ "eval_steps_per_second": 5.872,
3518
+ "step": 4500
3519
+ },
3520
+ {
3521
+ "epoch": 93.95,
3522
+ "learning_rate": 3.6675555812047956e-06,
3523
+ "loss": 2.6477,
3524
+ "step": 4510
3525
+ },
3526
+ {
3527
+ "epoch": 94.16,
3528
+ "learning_rate": 3.4196925545192604e-06,
3529
+ "loss": 2.808,
3530
+ "step": 4520
3531
+ },
3532
+ {
3533
+ "epoch": 94.37,
3534
+ "learning_rate": 3.1804285873132668e-06,
3535
+ "loss": 2.6339,
3536
+ "step": 4530
3537
+ },
3538
+ {
3539
+ "epoch": 94.58,
3540
+ "learning_rate": 2.9497741457678695e-06,
3541
+ "loss": 2.6228,
3542
+ "step": 4540
3543
+ },
3544
+ {
3545
+ "epoch": 94.78,
3546
+ "learning_rate": 2.7277393194555358e-06,
3547
+ "loss": 2.6302,
3548
+ "step": 4550
3549
+ },
3550
+ {
3551
+ "epoch": 94.78,
3552
+ "eval_accuracy": 0.45330473365422363,
3553
+ "eval_loss": 3.1695384979248047,
3554
+ "eval_runtime": 128.0386,
3555
+ "eval_samples_per_second": 187.873,
3556
+ "eval_steps_per_second": 5.873,
3557
+ "step": 4550
3558
+ },
3559
+ {
3560
+ "epoch": 94.78,
3561
+ "step": 4550,
3562
+ "total_flos": 1.602912650550436e+17,
3563
+ "train_loss": 3.832834072322636,
3564
+ "train_runtime": 39205.3218,
3565
+ "train_samples_per_second": 63.537,
3566
+ "train_steps_per_second": 0.122
3567
+ }
3568
+ ],
3569
+ "max_steps": 4800,
3570
+ "num_train_epochs": 100,
3571
+ "total_flos": 1.602912650550436e+17,
3572
+ "trial_name": null,
3573
+ "trial_params": null
3574
+ }