tmnam20 commited on
Commit
27d5c72
1 Parent(s): 1d5a47e

Upload trainer_state.json with huggingface_hub

Browse files
Files changed (1) hide show
  1. trainer_state.json +4509 -0
trainer_state.json ADDED
@@ -0,0 +1,4509 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 3.0,
5
+ "eval_steps": 5000,
6
+ "global_step": 36816,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "learning_rate": 1.9972837896566713e-05,
14
+ "loss": 1.0389,
15
+ "step": 50
16
+ },
17
+ {
18
+ "epoch": 0.01,
19
+ "learning_rate": 1.994567579313342e-05,
20
+ "loss": 0.9783,
21
+ "step": 100
22
+ },
23
+ {
24
+ "epoch": 0.01,
25
+ "learning_rate": 1.9918513689700133e-05,
26
+ "loss": 0.8949,
27
+ "step": 150
28
+ },
29
+ {
30
+ "epoch": 0.02,
31
+ "learning_rate": 1.9891351586266844e-05,
32
+ "loss": 0.8832,
33
+ "step": 200
34
+ },
35
+ {
36
+ "epoch": 0.02,
37
+ "learning_rate": 1.9864189482833552e-05,
38
+ "loss": 0.8377,
39
+ "step": 250
40
+ },
41
+ {
42
+ "epoch": 0.02,
43
+ "learning_rate": 1.983702737940026e-05,
44
+ "loss": 0.8297,
45
+ "step": 300
46
+ },
47
+ {
48
+ "epoch": 0.03,
49
+ "learning_rate": 1.9809865275966972e-05,
50
+ "loss": 0.7661,
51
+ "step": 350
52
+ },
53
+ {
54
+ "epoch": 0.03,
55
+ "learning_rate": 1.978270317253368e-05,
56
+ "loss": 0.803,
57
+ "step": 400
58
+ },
59
+ {
60
+ "epoch": 0.04,
61
+ "learning_rate": 1.9755541069100392e-05,
62
+ "loss": 0.7971,
63
+ "step": 450
64
+ },
65
+ {
66
+ "epoch": 0.04,
67
+ "learning_rate": 1.9728378965667103e-05,
68
+ "loss": 0.7882,
69
+ "step": 500
70
+ },
71
+ {
72
+ "epoch": 0.04,
73
+ "learning_rate": 1.9701216862233815e-05,
74
+ "loss": 0.7406,
75
+ "step": 550
76
+ },
77
+ {
78
+ "epoch": 0.05,
79
+ "learning_rate": 1.9674054758800523e-05,
80
+ "loss": 0.7539,
81
+ "step": 600
82
+ },
83
+ {
84
+ "epoch": 0.05,
85
+ "learning_rate": 1.9646892655367235e-05,
86
+ "loss": 0.7562,
87
+ "step": 650
88
+ },
89
+ {
90
+ "epoch": 0.06,
91
+ "learning_rate": 1.9619730551933943e-05,
92
+ "loss": 0.7525,
93
+ "step": 700
94
+ },
95
+ {
96
+ "epoch": 0.06,
97
+ "learning_rate": 1.9592568448500654e-05,
98
+ "loss": 0.7378,
99
+ "step": 750
100
+ },
101
+ {
102
+ "epoch": 0.07,
103
+ "learning_rate": 1.9565406345067362e-05,
104
+ "loss": 0.738,
105
+ "step": 800
106
+ },
107
+ {
108
+ "epoch": 0.07,
109
+ "learning_rate": 1.9538244241634074e-05,
110
+ "loss": 0.727,
111
+ "step": 850
112
+ },
113
+ {
114
+ "epoch": 0.07,
115
+ "learning_rate": 1.9511082138200782e-05,
116
+ "loss": 0.7254,
117
+ "step": 900
118
+ },
119
+ {
120
+ "epoch": 0.08,
121
+ "learning_rate": 1.9483920034767494e-05,
122
+ "loss": 0.7433,
123
+ "step": 950
124
+ },
125
+ {
126
+ "epoch": 0.08,
127
+ "learning_rate": 1.9456757931334205e-05,
128
+ "loss": 0.7547,
129
+ "step": 1000
130
+ },
131
+ {
132
+ "epoch": 0.09,
133
+ "learning_rate": 1.9429595827900913e-05,
134
+ "loss": 0.7251,
135
+ "step": 1050
136
+ },
137
+ {
138
+ "epoch": 0.09,
139
+ "learning_rate": 1.9402433724467625e-05,
140
+ "loss": 0.7203,
141
+ "step": 1100
142
+ },
143
+ {
144
+ "epoch": 0.09,
145
+ "learning_rate": 1.9375271621034336e-05,
146
+ "loss": 0.704,
147
+ "step": 1150
148
+ },
149
+ {
150
+ "epoch": 0.1,
151
+ "learning_rate": 1.9348109517601044e-05,
152
+ "loss": 0.7271,
153
+ "step": 1200
154
+ },
155
+ {
156
+ "epoch": 0.1,
157
+ "learning_rate": 1.9320947414167756e-05,
158
+ "loss": 0.6955,
159
+ "step": 1250
160
+ },
161
+ {
162
+ "epoch": 0.11,
163
+ "learning_rate": 1.9293785310734464e-05,
164
+ "loss": 0.6565,
165
+ "step": 1300
166
+ },
167
+ {
168
+ "epoch": 0.11,
169
+ "learning_rate": 1.9266623207301176e-05,
170
+ "loss": 0.7266,
171
+ "step": 1350
172
+ },
173
+ {
174
+ "epoch": 0.11,
175
+ "learning_rate": 1.9239461103867884e-05,
176
+ "loss": 0.6809,
177
+ "step": 1400
178
+ },
179
+ {
180
+ "epoch": 0.12,
181
+ "learning_rate": 1.9212299000434595e-05,
182
+ "loss": 0.7345,
183
+ "step": 1450
184
+ },
185
+ {
186
+ "epoch": 0.12,
187
+ "learning_rate": 1.9185136897001307e-05,
188
+ "loss": 0.7052,
189
+ "step": 1500
190
+ },
191
+ {
192
+ "epoch": 0.13,
193
+ "learning_rate": 1.9157974793568015e-05,
194
+ "loss": 0.7097,
195
+ "step": 1550
196
+ },
197
+ {
198
+ "epoch": 0.13,
199
+ "learning_rate": 1.9130812690134726e-05,
200
+ "loss": 0.7224,
201
+ "step": 1600
202
+ },
203
+ {
204
+ "epoch": 0.13,
205
+ "learning_rate": 1.9103650586701438e-05,
206
+ "loss": 0.6998,
207
+ "step": 1650
208
+ },
209
+ {
210
+ "epoch": 0.14,
211
+ "learning_rate": 1.9076488483268146e-05,
212
+ "loss": 0.6799,
213
+ "step": 1700
214
+ },
215
+ {
216
+ "epoch": 0.14,
217
+ "learning_rate": 1.9049326379834858e-05,
218
+ "loss": 0.6957,
219
+ "step": 1750
220
+ },
221
+ {
222
+ "epoch": 0.15,
223
+ "learning_rate": 1.9022164276401566e-05,
224
+ "loss": 0.7111,
225
+ "step": 1800
226
+ },
227
+ {
228
+ "epoch": 0.15,
229
+ "learning_rate": 1.8995002172968274e-05,
230
+ "loss": 0.7109,
231
+ "step": 1850
232
+ },
233
+ {
234
+ "epoch": 0.15,
235
+ "learning_rate": 1.8967840069534985e-05,
236
+ "loss": 0.7088,
237
+ "step": 1900
238
+ },
239
+ {
240
+ "epoch": 0.16,
241
+ "learning_rate": 1.8940677966101697e-05,
242
+ "loss": 0.6694,
243
+ "step": 1950
244
+ },
245
+ {
246
+ "epoch": 0.16,
247
+ "learning_rate": 1.8913515862668405e-05,
248
+ "loss": 0.6598,
249
+ "step": 2000
250
+ },
251
+ {
252
+ "epoch": 0.17,
253
+ "learning_rate": 1.8886353759235117e-05,
254
+ "loss": 0.6834,
255
+ "step": 2050
256
+ },
257
+ {
258
+ "epoch": 0.17,
259
+ "learning_rate": 1.8859191655801828e-05,
260
+ "loss": 0.7251,
261
+ "step": 2100
262
+ },
263
+ {
264
+ "epoch": 0.18,
265
+ "learning_rate": 1.8832029552368536e-05,
266
+ "loss": 0.6978,
267
+ "step": 2150
268
+ },
269
+ {
270
+ "epoch": 0.18,
271
+ "learning_rate": 1.8804867448935248e-05,
272
+ "loss": 0.6761,
273
+ "step": 2200
274
+ },
275
+ {
276
+ "epoch": 0.18,
277
+ "learning_rate": 1.8777705345501956e-05,
278
+ "loss": 0.6638,
279
+ "step": 2250
280
+ },
281
+ {
282
+ "epoch": 0.19,
283
+ "learning_rate": 1.8750543242068668e-05,
284
+ "loss": 0.6987,
285
+ "step": 2300
286
+ },
287
+ {
288
+ "epoch": 0.19,
289
+ "learning_rate": 1.8723381138635376e-05,
290
+ "loss": 0.6922,
291
+ "step": 2350
292
+ },
293
+ {
294
+ "epoch": 0.2,
295
+ "learning_rate": 1.8696219035202087e-05,
296
+ "loss": 0.6764,
297
+ "step": 2400
298
+ },
299
+ {
300
+ "epoch": 0.2,
301
+ "learning_rate": 1.86690569317688e-05,
302
+ "loss": 0.6812,
303
+ "step": 2450
304
+ },
305
+ {
306
+ "epoch": 0.2,
307
+ "learning_rate": 1.8641894828335507e-05,
308
+ "loss": 0.6764,
309
+ "step": 2500
310
+ },
311
+ {
312
+ "epoch": 0.21,
313
+ "learning_rate": 1.861473272490222e-05,
314
+ "loss": 0.6856,
315
+ "step": 2550
316
+ },
317
+ {
318
+ "epoch": 0.21,
319
+ "learning_rate": 1.858757062146893e-05,
320
+ "loss": 0.7037,
321
+ "step": 2600
322
+ },
323
+ {
324
+ "epoch": 0.22,
325
+ "learning_rate": 1.8560408518035638e-05,
326
+ "loss": 0.6843,
327
+ "step": 2650
328
+ },
329
+ {
330
+ "epoch": 0.22,
331
+ "learning_rate": 1.853324641460235e-05,
332
+ "loss": 0.6761,
333
+ "step": 2700
334
+ },
335
+ {
336
+ "epoch": 0.22,
337
+ "learning_rate": 1.8506084311169058e-05,
338
+ "loss": 0.6778,
339
+ "step": 2750
340
+ },
341
+ {
342
+ "epoch": 0.23,
343
+ "learning_rate": 1.847892220773577e-05,
344
+ "loss": 0.6733,
345
+ "step": 2800
346
+ },
347
+ {
348
+ "epoch": 0.23,
349
+ "learning_rate": 1.8451760104302477e-05,
350
+ "loss": 0.6711,
351
+ "step": 2850
352
+ },
353
+ {
354
+ "epoch": 0.24,
355
+ "learning_rate": 1.842459800086919e-05,
356
+ "loss": 0.684,
357
+ "step": 2900
358
+ },
359
+ {
360
+ "epoch": 0.24,
361
+ "learning_rate": 1.8397435897435897e-05,
362
+ "loss": 0.6783,
363
+ "step": 2950
364
+ },
365
+ {
366
+ "epoch": 0.24,
367
+ "learning_rate": 1.837027379400261e-05,
368
+ "loss": 0.6788,
369
+ "step": 3000
370
+ },
371
+ {
372
+ "epoch": 0.25,
373
+ "learning_rate": 1.834311169056932e-05,
374
+ "loss": 0.6497,
375
+ "step": 3050
376
+ },
377
+ {
378
+ "epoch": 0.25,
379
+ "learning_rate": 1.8315949587136028e-05,
380
+ "loss": 0.6246,
381
+ "step": 3100
382
+ },
383
+ {
384
+ "epoch": 0.26,
385
+ "learning_rate": 1.828878748370274e-05,
386
+ "loss": 0.6497,
387
+ "step": 3150
388
+ },
389
+ {
390
+ "epoch": 0.26,
391
+ "learning_rate": 1.826162538026945e-05,
392
+ "loss": 0.663,
393
+ "step": 3200
394
+ },
395
+ {
396
+ "epoch": 0.26,
397
+ "learning_rate": 1.823446327683616e-05,
398
+ "loss": 0.6542,
399
+ "step": 3250
400
+ },
401
+ {
402
+ "epoch": 0.27,
403
+ "learning_rate": 1.820730117340287e-05,
404
+ "loss": 0.6734,
405
+ "step": 3300
406
+ },
407
+ {
408
+ "epoch": 0.27,
409
+ "learning_rate": 1.818013906996958e-05,
410
+ "loss": 0.6548,
411
+ "step": 3350
412
+ },
413
+ {
414
+ "epoch": 0.28,
415
+ "learning_rate": 1.815297696653629e-05,
416
+ "loss": 0.6362,
417
+ "step": 3400
418
+ },
419
+ {
420
+ "epoch": 0.28,
421
+ "learning_rate": 1.8125814863103e-05,
422
+ "loss": 0.6544,
423
+ "step": 3450
424
+ },
425
+ {
426
+ "epoch": 0.29,
427
+ "learning_rate": 1.809865275966971e-05,
428
+ "loss": 0.658,
429
+ "step": 3500
430
+ },
431
+ {
432
+ "epoch": 0.29,
433
+ "learning_rate": 1.8071490656236422e-05,
434
+ "loss": 0.6443,
435
+ "step": 3550
436
+ },
437
+ {
438
+ "epoch": 0.29,
439
+ "learning_rate": 1.804432855280313e-05,
440
+ "loss": 0.6707,
441
+ "step": 3600
442
+ },
443
+ {
444
+ "epoch": 0.3,
445
+ "learning_rate": 1.801716644936984e-05,
446
+ "loss": 0.6438,
447
+ "step": 3650
448
+ },
449
+ {
450
+ "epoch": 0.3,
451
+ "learning_rate": 1.7990004345936553e-05,
452
+ "loss": 0.6871,
453
+ "step": 3700
454
+ },
455
+ {
456
+ "epoch": 0.31,
457
+ "learning_rate": 1.796284224250326e-05,
458
+ "loss": 0.6694,
459
+ "step": 3750
460
+ },
461
+ {
462
+ "epoch": 0.31,
463
+ "learning_rate": 1.793568013906997e-05,
464
+ "loss": 0.6279,
465
+ "step": 3800
466
+ },
467
+ {
468
+ "epoch": 0.31,
469
+ "learning_rate": 1.790851803563668e-05,
470
+ "loss": 0.634,
471
+ "step": 3850
472
+ },
473
+ {
474
+ "epoch": 0.32,
475
+ "learning_rate": 1.788135593220339e-05,
476
+ "loss": 0.6535,
477
+ "step": 3900
478
+ },
479
+ {
480
+ "epoch": 0.32,
481
+ "learning_rate": 1.78541938287701e-05,
482
+ "loss": 0.6388,
483
+ "step": 3950
484
+ },
485
+ {
486
+ "epoch": 0.33,
487
+ "learning_rate": 1.7827031725336812e-05,
488
+ "loss": 0.6227,
489
+ "step": 4000
490
+ },
491
+ {
492
+ "epoch": 0.33,
493
+ "learning_rate": 1.7799869621903524e-05,
494
+ "loss": 0.6741,
495
+ "step": 4050
496
+ },
497
+ {
498
+ "epoch": 0.33,
499
+ "learning_rate": 1.777270751847023e-05,
500
+ "loss": 0.6566,
501
+ "step": 4100
502
+ },
503
+ {
504
+ "epoch": 0.34,
505
+ "learning_rate": 1.7745545415036943e-05,
506
+ "loss": 0.6695,
507
+ "step": 4150
508
+ },
509
+ {
510
+ "epoch": 0.34,
511
+ "learning_rate": 1.771838331160365e-05,
512
+ "loss": 0.6454,
513
+ "step": 4200
514
+ },
515
+ {
516
+ "epoch": 0.35,
517
+ "learning_rate": 1.7691221208170363e-05,
518
+ "loss": 0.6531,
519
+ "step": 4250
520
+ },
521
+ {
522
+ "epoch": 0.35,
523
+ "learning_rate": 1.766405910473707e-05,
524
+ "loss": 0.6287,
525
+ "step": 4300
526
+ },
527
+ {
528
+ "epoch": 0.35,
529
+ "learning_rate": 1.7636897001303783e-05,
530
+ "loss": 0.6356,
531
+ "step": 4350
532
+ },
533
+ {
534
+ "epoch": 0.36,
535
+ "learning_rate": 1.760973489787049e-05,
536
+ "loss": 0.6609,
537
+ "step": 4400
538
+ },
539
+ {
540
+ "epoch": 0.36,
541
+ "learning_rate": 1.7582572794437202e-05,
542
+ "loss": 0.6502,
543
+ "step": 4450
544
+ },
545
+ {
546
+ "epoch": 0.37,
547
+ "learning_rate": 1.7555410691003914e-05,
548
+ "loss": 0.6211,
549
+ "step": 4500
550
+ },
551
+ {
552
+ "epoch": 0.37,
553
+ "learning_rate": 1.7528248587570622e-05,
554
+ "loss": 0.6432,
555
+ "step": 4550
556
+ },
557
+ {
558
+ "epoch": 0.37,
559
+ "learning_rate": 1.7501086484137333e-05,
560
+ "loss": 0.6416,
561
+ "step": 4600
562
+ },
563
+ {
564
+ "epoch": 0.38,
565
+ "learning_rate": 1.7473924380704045e-05,
566
+ "loss": 0.6211,
567
+ "step": 4650
568
+ },
569
+ {
570
+ "epoch": 0.38,
571
+ "learning_rate": 1.7446762277270753e-05,
572
+ "loss": 0.6537,
573
+ "step": 4700
574
+ },
575
+ {
576
+ "epoch": 0.39,
577
+ "learning_rate": 1.7419600173837465e-05,
578
+ "loss": 0.6001,
579
+ "step": 4750
580
+ },
581
+ {
582
+ "epoch": 0.39,
583
+ "learning_rate": 1.7392438070404173e-05,
584
+ "loss": 0.6057,
585
+ "step": 4800
586
+ },
587
+ {
588
+ "epoch": 0.4,
589
+ "learning_rate": 1.7365275966970884e-05,
590
+ "loss": 0.6716,
591
+ "step": 4850
592
+ },
593
+ {
594
+ "epoch": 0.4,
595
+ "learning_rate": 1.7338113863537592e-05,
596
+ "loss": 0.6447,
597
+ "step": 4900
598
+ },
599
+ {
600
+ "epoch": 0.4,
601
+ "learning_rate": 1.7310951760104304e-05,
602
+ "loss": 0.6371,
603
+ "step": 4950
604
+ },
605
+ {
606
+ "epoch": 0.41,
607
+ "learning_rate": 1.7283789656671015e-05,
608
+ "loss": 0.6369,
609
+ "step": 5000
610
+ },
611
+ {
612
+ "epoch": 0.41,
613
+ "eval_accuracy": 0.7400916963830871,
614
+ "eval_loss": 0.6398988962173462,
615
+ "eval_runtime": 16.7037,
616
+ "eval_samples_per_second": 587.595,
617
+ "eval_steps_per_second": 36.758,
618
+ "step": 5000
619
+ },
620
+ {
621
+ "epoch": 0.41,
622
+ "learning_rate": 1.7256627553237724e-05,
623
+ "loss": 0.6108,
624
+ "step": 5050
625
+ },
626
+ {
627
+ "epoch": 0.42,
628
+ "learning_rate": 1.7229465449804435e-05,
629
+ "loss": 0.6118,
630
+ "step": 5100
631
+ },
632
+ {
633
+ "epoch": 0.42,
634
+ "learning_rate": 1.7202303346371147e-05,
635
+ "loss": 0.6306,
636
+ "step": 5150
637
+ },
638
+ {
639
+ "epoch": 0.42,
640
+ "learning_rate": 1.7175141242937855e-05,
641
+ "loss": 0.6089,
642
+ "step": 5200
643
+ },
644
+ {
645
+ "epoch": 0.43,
646
+ "learning_rate": 1.7147979139504566e-05,
647
+ "loss": 0.6283,
648
+ "step": 5250
649
+ },
650
+ {
651
+ "epoch": 0.43,
652
+ "learning_rate": 1.7120817036071274e-05,
653
+ "loss": 0.6383,
654
+ "step": 5300
655
+ },
656
+ {
657
+ "epoch": 0.44,
658
+ "learning_rate": 1.7093654932637983e-05,
659
+ "loss": 0.6179,
660
+ "step": 5350
661
+ },
662
+ {
663
+ "epoch": 0.44,
664
+ "learning_rate": 1.7066492829204694e-05,
665
+ "loss": 0.6357,
666
+ "step": 5400
667
+ },
668
+ {
669
+ "epoch": 0.44,
670
+ "learning_rate": 1.7039330725771406e-05,
671
+ "loss": 0.634,
672
+ "step": 5450
673
+ },
674
+ {
675
+ "epoch": 0.45,
676
+ "learning_rate": 1.7012168622338114e-05,
677
+ "loss": 0.5852,
678
+ "step": 5500
679
+ },
680
+ {
681
+ "epoch": 0.45,
682
+ "learning_rate": 1.6985006518904825e-05,
683
+ "loss": 0.6249,
684
+ "step": 5550
685
+ },
686
+ {
687
+ "epoch": 0.46,
688
+ "learning_rate": 1.6957844415471537e-05,
689
+ "loss": 0.6043,
690
+ "step": 5600
691
+ },
692
+ {
693
+ "epoch": 0.46,
694
+ "learning_rate": 1.6930682312038245e-05,
695
+ "loss": 0.6171,
696
+ "step": 5650
697
+ },
698
+ {
699
+ "epoch": 0.46,
700
+ "learning_rate": 1.6903520208604957e-05,
701
+ "loss": 0.6358,
702
+ "step": 5700
703
+ },
704
+ {
705
+ "epoch": 0.47,
706
+ "learning_rate": 1.6876358105171665e-05,
707
+ "loss": 0.6129,
708
+ "step": 5750
709
+ },
710
+ {
711
+ "epoch": 0.47,
712
+ "learning_rate": 1.6849196001738376e-05,
713
+ "loss": 0.6069,
714
+ "step": 5800
715
+ },
716
+ {
717
+ "epoch": 0.48,
718
+ "learning_rate": 1.6822033898305084e-05,
719
+ "loss": 0.6553,
720
+ "step": 5850
721
+ },
722
+ {
723
+ "epoch": 0.48,
724
+ "learning_rate": 1.6794871794871796e-05,
725
+ "loss": 0.6372,
726
+ "step": 5900
727
+ },
728
+ {
729
+ "epoch": 0.48,
730
+ "learning_rate": 1.6767709691438507e-05,
731
+ "loss": 0.6348,
732
+ "step": 5950
733
+ },
734
+ {
735
+ "epoch": 0.49,
736
+ "learning_rate": 1.6740547588005215e-05,
737
+ "loss": 0.596,
738
+ "step": 6000
739
+ },
740
+ {
741
+ "epoch": 0.49,
742
+ "learning_rate": 1.6713385484571927e-05,
743
+ "loss": 0.6192,
744
+ "step": 6050
745
+ },
746
+ {
747
+ "epoch": 0.5,
748
+ "learning_rate": 1.668622338113864e-05,
749
+ "loss": 0.6435,
750
+ "step": 6100
751
+ },
752
+ {
753
+ "epoch": 0.5,
754
+ "learning_rate": 1.6659061277705347e-05,
755
+ "loss": 0.6135,
756
+ "step": 6150
757
+ },
758
+ {
759
+ "epoch": 0.51,
760
+ "learning_rate": 1.6631899174272058e-05,
761
+ "loss": 0.6476,
762
+ "step": 6200
763
+ },
764
+ {
765
+ "epoch": 0.51,
766
+ "learning_rate": 1.6604737070838766e-05,
767
+ "loss": 0.646,
768
+ "step": 6250
769
+ },
770
+ {
771
+ "epoch": 0.51,
772
+ "learning_rate": 1.6577574967405478e-05,
773
+ "loss": 0.6104,
774
+ "step": 6300
775
+ },
776
+ {
777
+ "epoch": 0.52,
778
+ "learning_rate": 1.6550412863972186e-05,
779
+ "loss": 0.6151,
780
+ "step": 6350
781
+ },
782
+ {
783
+ "epoch": 0.52,
784
+ "learning_rate": 1.6523250760538898e-05,
785
+ "loss": 0.6314,
786
+ "step": 6400
787
+ },
788
+ {
789
+ "epoch": 0.53,
790
+ "learning_rate": 1.6496088657105606e-05,
791
+ "loss": 0.6256,
792
+ "step": 6450
793
+ },
794
+ {
795
+ "epoch": 0.53,
796
+ "learning_rate": 1.6468926553672317e-05,
797
+ "loss": 0.6284,
798
+ "step": 6500
799
+ },
800
+ {
801
+ "epoch": 0.53,
802
+ "learning_rate": 1.644176445023903e-05,
803
+ "loss": 0.6134,
804
+ "step": 6550
805
+ },
806
+ {
807
+ "epoch": 0.54,
808
+ "learning_rate": 1.6414602346805737e-05,
809
+ "loss": 0.6203,
810
+ "step": 6600
811
+ },
812
+ {
813
+ "epoch": 0.54,
814
+ "learning_rate": 1.638744024337245e-05,
815
+ "loss": 0.5652,
816
+ "step": 6650
817
+ },
818
+ {
819
+ "epoch": 0.55,
820
+ "learning_rate": 1.636027813993916e-05,
821
+ "loss": 0.6353,
822
+ "step": 6700
823
+ },
824
+ {
825
+ "epoch": 0.55,
826
+ "learning_rate": 1.6333116036505868e-05,
827
+ "loss": 0.6013,
828
+ "step": 6750
829
+ },
830
+ {
831
+ "epoch": 0.55,
832
+ "learning_rate": 1.630595393307258e-05,
833
+ "loss": 0.6112,
834
+ "step": 6800
835
+ },
836
+ {
837
+ "epoch": 0.56,
838
+ "learning_rate": 1.6278791829639288e-05,
839
+ "loss": 0.6161,
840
+ "step": 6850
841
+ },
842
+ {
843
+ "epoch": 0.56,
844
+ "learning_rate": 1.6251629726206e-05,
845
+ "loss": 0.6173,
846
+ "step": 6900
847
+ },
848
+ {
849
+ "epoch": 0.57,
850
+ "learning_rate": 1.6224467622772707e-05,
851
+ "loss": 0.6203,
852
+ "step": 6950
853
+ },
854
+ {
855
+ "epoch": 0.57,
856
+ "learning_rate": 1.619730551933942e-05,
857
+ "loss": 0.6264,
858
+ "step": 7000
859
+ },
860
+ {
861
+ "epoch": 0.57,
862
+ "learning_rate": 1.617014341590613e-05,
863
+ "loss": 0.6104,
864
+ "step": 7050
865
+ },
866
+ {
867
+ "epoch": 0.58,
868
+ "learning_rate": 1.614298131247284e-05,
869
+ "loss": 0.5963,
870
+ "step": 7100
871
+ },
872
+ {
873
+ "epoch": 0.58,
874
+ "learning_rate": 1.611581920903955e-05,
875
+ "loss": 0.6044,
876
+ "step": 7150
877
+ },
878
+ {
879
+ "epoch": 0.59,
880
+ "learning_rate": 1.608865710560626e-05,
881
+ "loss": 0.601,
882
+ "step": 7200
883
+ },
884
+ {
885
+ "epoch": 0.59,
886
+ "learning_rate": 1.606149500217297e-05,
887
+ "loss": 0.5882,
888
+ "step": 7250
889
+ },
890
+ {
891
+ "epoch": 0.59,
892
+ "learning_rate": 1.6034332898739678e-05,
893
+ "loss": 0.5976,
894
+ "step": 7300
895
+ },
896
+ {
897
+ "epoch": 0.6,
898
+ "learning_rate": 1.600717079530639e-05,
899
+ "loss": 0.6077,
900
+ "step": 7350
901
+ },
902
+ {
903
+ "epoch": 0.6,
904
+ "learning_rate": 1.5980008691873098e-05,
905
+ "loss": 0.6079,
906
+ "step": 7400
907
+ },
908
+ {
909
+ "epoch": 0.61,
910
+ "learning_rate": 1.595284658843981e-05,
911
+ "loss": 0.6181,
912
+ "step": 7450
913
+ },
914
+ {
915
+ "epoch": 0.61,
916
+ "learning_rate": 1.592568448500652e-05,
917
+ "loss": 0.6186,
918
+ "step": 7500
919
+ },
920
+ {
921
+ "epoch": 0.62,
922
+ "learning_rate": 1.589852238157323e-05,
923
+ "loss": 0.6134,
924
+ "step": 7550
925
+ },
926
+ {
927
+ "epoch": 0.62,
928
+ "learning_rate": 1.587136027813994e-05,
929
+ "loss": 0.6042,
930
+ "step": 7600
931
+ },
932
+ {
933
+ "epoch": 0.62,
934
+ "learning_rate": 1.5844198174706652e-05,
935
+ "loss": 0.621,
936
+ "step": 7650
937
+ },
938
+ {
939
+ "epoch": 0.63,
940
+ "learning_rate": 1.5817036071273363e-05,
941
+ "loss": 0.6249,
942
+ "step": 7700
943
+ },
944
+ {
945
+ "epoch": 0.63,
946
+ "learning_rate": 1.578987396784007e-05,
947
+ "loss": 0.6148,
948
+ "step": 7750
949
+ },
950
+ {
951
+ "epoch": 0.64,
952
+ "learning_rate": 1.576271186440678e-05,
953
+ "loss": 0.6052,
954
+ "step": 7800
955
+ },
956
+ {
957
+ "epoch": 0.64,
958
+ "learning_rate": 1.573554976097349e-05,
959
+ "loss": 0.6207,
960
+ "step": 7850
961
+ },
962
+ {
963
+ "epoch": 0.64,
964
+ "learning_rate": 1.57083876575402e-05,
965
+ "loss": 0.575,
966
+ "step": 7900
967
+ },
968
+ {
969
+ "epoch": 0.65,
970
+ "learning_rate": 1.568122555410691e-05,
971
+ "loss": 0.6081,
972
+ "step": 7950
973
+ },
974
+ {
975
+ "epoch": 0.65,
976
+ "learning_rate": 1.5654063450673622e-05,
977
+ "loss": 0.6148,
978
+ "step": 8000
979
+ },
980
+ {
981
+ "epoch": 0.66,
982
+ "learning_rate": 1.562690134724033e-05,
983
+ "loss": 0.603,
984
+ "step": 8050
985
+ },
986
+ {
987
+ "epoch": 0.66,
988
+ "learning_rate": 1.5599739243807042e-05,
989
+ "loss": 0.5894,
990
+ "step": 8100
991
+ },
992
+ {
993
+ "epoch": 0.66,
994
+ "learning_rate": 1.5572577140373754e-05,
995
+ "loss": 0.6078,
996
+ "step": 8150
997
+ },
998
+ {
999
+ "epoch": 0.67,
1000
+ "learning_rate": 1.5545415036940462e-05,
1001
+ "loss": 0.5784,
1002
+ "step": 8200
1003
+ },
1004
+ {
1005
+ "epoch": 0.67,
1006
+ "learning_rate": 1.5518252933507173e-05,
1007
+ "loss": 0.5849,
1008
+ "step": 8250
1009
+ },
1010
+ {
1011
+ "epoch": 0.68,
1012
+ "learning_rate": 1.549109083007388e-05,
1013
+ "loss": 0.6066,
1014
+ "step": 8300
1015
+ },
1016
+ {
1017
+ "epoch": 0.68,
1018
+ "learning_rate": 1.5463928726640593e-05,
1019
+ "loss": 0.599,
1020
+ "step": 8350
1021
+ },
1022
+ {
1023
+ "epoch": 0.68,
1024
+ "learning_rate": 1.54367666232073e-05,
1025
+ "loss": 0.6186,
1026
+ "step": 8400
1027
+ },
1028
+ {
1029
+ "epoch": 0.69,
1030
+ "learning_rate": 1.5409604519774013e-05,
1031
+ "loss": 0.5947,
1032
+ "step": 8450
1033
+ },
1034
+ {
1035
+ "epoch": 0.69,
1036
+ "learning_rate": 1.5382442416340724e-05,
1037
+ "loss": 0.6191,
1038
+ "step": 8500
1039
+ },
1040
+ {
1041
+ "epoch": 0.7,
1042
+ "learning_rate": 1.5355280312907432e-05,
1043
+ "loss": 0.5896,
1044
+ "step": 8550
1045
+ },
1046
+ {
1047
+ "epoch": 0.7,
1048
+ "learning_rate": 1.5328118209474144e-05,
1049
+ "loss": 0.6241,
1050
+ "step": 8600
1051
+ },
1052
+ {
1053
+ "epoch": 0.7,
1054
+ "learning_rate": 1.5300956106040855e-05,
1055
+ "loss": 0.5856,
1056
+ "step": 8650
1057
+ },
1058
+ {
1059
+ "epoch": 0.71,
1060
+ "learning_rate": 1.5273794002607563e-05,
1061
+ "loss": 0.5966,
1062
+ "step": 8700
1063
+ },
1064
+ {
1065
+ "epoch": 0.71,
1066
+ "learning_rate": 1.5246631899174273e-05,
1067
+ "loss": 0.602,
1068
+ "step": 8750
1069
+ },
1070
+ {
1071
+ "epoch": 0.72,
1072
+ "learning_rate": 1.5219469795740985e-05,
1073
+ "loss": 0.6188,
1074
+ "step": 8800
1075
+ },
1076
+ {
1077
+ "epoch": 0.72,
1078
+ "learning_rate": 1.5192307692307693e-05,
1079
+ "loss": 0.6342,
1080
+ "step": 8850
1081
+ },
1082
+ {
1083
+ "epoch": 0.73,
1084
+ "learning_rate": 1.5165145588874404e-05,
1085
+ "loss": 0.605,
1086
+ "step": 8900
1087
+ },
1088
+ {
1089
+ "epoch": 0.73,
1090
+ "learning_rate": 1.5137983485441114e-05,
1091
+ "loss": 0.6138,
1092
+ "step": 8950
1093
+ },
1094
+ {
1095
+ "epoch": 0.73,
1096
+ "learning_rate": 1.5110821382007822e-05,
1097
+ "loss": 0.6043,
1098
+ "step": 9000
1099
+ },
1100
+ {
1101
+ "epoch": 0.74,
1102
+ "learning_rate": 1.5083659278574534e-05,
1103
+ "loss": 0.6095,
1104
+ "step": 9050
1105
+ },
1106
+ {
1107
+ "epoch": 0.74,
1108
+ "learning_rate": 1.5056497175141245e-05,
1109
+ "loss": 0.5755,
1110
+ "step": 9100
1111
+ },
1112
+ {
1113
+ "epoch": 0.75,
1114
+ "learning_rate": 1.5029335071707954e-05,
1115
+ "loss": 0.5987,
1116
+ "step": 9150
1117
+ },
1118
+ {
1119
+ "epoch": 0.75,
1120
+ "learning_rate": 1.5002172968274663e-05,
1121
+ "loss": 0.5747,
1122
+ "step": 9200
1123
+ },
1124
+ {
1125
+ "epoch": 0.75,
1126
+ "learning_rate": 1.4975010864841375e-05,
1127
+ "loss": 0.5762,
1128
+ "step": 9250
1129
+ },
1130
+ {
1131
+ "epoch": 0.76,
1132
+ "learning_rate": 1.4947848761408083e-05,
1133
+ "loss": 0.5679,
1134
+ "step": 9300
1135
+ },
1136
+ {
1137
+ "epoch": 0.76,
1138
+ "learning_rate": 1.4920686657974795e-05,
1139
+ "loss": 0.5724,
1140
+ "step": 9350
1141
+ },
1142
+ {
1143
+ "epoch": 0.77,
1144
+ "learning_rate": 1.4893524554541504e-05,
1145
+ "loss": 0.5789,
1146
+ "step": 9400
1147
+ },
1148
+ {
1149
+ "epoch": 0.77,
1150
+ "learning_rate": 1.4866362451108216e-05,
1151
+ "loss": 0.5888,
1152
+ "step": 9450
1153
+ },
1154
+ {
1155
+ "epoch": 0.77,
1156
+ "learning_rate": 1.4839200347674924e-05,
1157
+ "loss": 0.5821,
1158
+ "step": 9500
1159
+ },
1160
+ {
1161
+ "epoch": 0.78,
1162
+ "learning_rate": 1.4812038244241636e-05,
1163
+ "loss": 0.6436,
1164
+ "step": 9550
1165
+ },
1166
+ {
1167
+ "epoch": 0.78,
1168
+ "learning_rate": 1.4784876140808346e-05,
1169
+ "loss": 0.5867,
1170
+ "step": 9600
1171
+ },
1172
+ {
1173
+ "epoch": 0.79,
1174
+ "learning_rate": 1.4757714037375055e-05,
1175
+ "loss": 0.5537,
1176
+ "step": 9650
1177
+ },
1178
+ {
1179
+ "epoch": 0.79,
1180
+ "learning_rate": 1.4730551933941765e-05,
1181
+ "loss": 0.5978,
1182
+ "step": 9700
1183
+ },
1184
+ {
1185
+ "epoch": 0.79,
1186
+ "learning_rate": 1.4703389830508477e-05,
1187
+ "loss": 0.5499,
1188
+ "step": 9750
1189
+ },
1190
+ {
1191
+ "epoch": 0.8,
1192
+ "learning_rate": 1.4676227727075185e-05,
1193
+ "loss": 0.6219,
1194
+ "step": 9800
1195
+ },
1196
+ {
1197
+ "epoch": 0.8,
1198
+ "learning_rate": 1.4649065623641896e-05,
1199
+ "loss": 0.5923,
1200
+ "step": 9850
1201
+ },
1202
+ {
1203
+ "epoch": 0.81,
1204
+ "learning_rate": 1.4621903520208606e-05,
1205
+ "loss": 0.6236,
1206
+ "step": 9900
1207
+ },
1208
+ {
1209
+ "epoch": 0.81,
1210
+ "learning_rate": 1.4594741416775316e-05,
1211
+ "loss": 0.5732,
1212
+ "step": 9950
1213
+ },
1214
+ {
1215
+ "epoch": 0.81,
1216
+ "learning_rate": 1.4567579313342026e-05,
1217
+ "loss": 0.5945,
1218
+ "step": 10000
1219
+ },
1220
+ {
1221
+ "epoch": 0.81,
1222
+ "eval_accuracy": 0.7680081507896077,
1223
+ "eval_loss": 0.5745856761932373,
1224
+ "eval_runtime": 16.756,
1225
+ "eval_samples_per_second": 585.761,
1226
+ "eval_steps_per_second": 36.644,
1227
+ "step": 10000
1228
+ },
1229
+ {
1230
+ "epoch": 0.82,
1231
+ "learning_rate": 1.4540417209908737e-05,
1232
+ "loss": 0.5909,
1233
+ "step": 10050
1234
+ },
1235
+ {
1236
+ "epoch": 0.82,
1237
+ "learning_rate": 1.4513255106475446e-05,
1238
+ "loss": 0.5937,
1239
+ "step": 10100
1240
+ },
1241
+ {
1242
+ "epoch": 0.83,
1243
+ "learning_rate": 1.4486093003042157e-05,
1244
+ "loss": 0.5916,
1245
+ "step": 10150
1246
+ },
1247
+ {
1248
+ "epoch": 0.83,
1249
+ "learning_rate": 1.4458930899608867e-05,
1250
+ "loss": 0.5777,
1251
+ "step": 10200
1252
+ },
1253
+ {
1254
+ "epoch": 0.84,
1255
+ "learning_rate": 1.4431768796175577e-05,
1256
+ "loss": 0.581,
1257
+ "step": 10250
1258
+ },
1259
+ {
1260
+ "epoch": 0.84,
1261
+ "learning_rate": 1.4404606692742287e-05,
1262
+ "loss": 0.6225,
1263
+ "step": 10300
1264
+ },
1265
+ {
1266
+ "epoch": 0.84,
1267
+ "learning_rate": 1.4377444589308998e-05,
1268
+ "loss": 0.6204,
1269
+ "step": 10350
1270
+ },
1271
+ {
1272
+ "epoch": 0.85,
1273
+ "learning_rate": 1.4350282485875708e-05,
1274
+ "loss": 0.5783,
1275
+ "step": 10400
1276
+ },
1277
+ {
1278
+ "epoch": 0.85,
1279
+ "learning_rate": 1.4323120382442418e-05,
1280
+ "loss": 0.5715,
1281
+ "step": 10450
1282
+ },
1283
+ {
1284
+ "epoch": 0.86,
1285
+ "learning_rate": 1.4295958279009128e-05,
1286
+ "loss": 0.601,
1287
+ "step": 10500
1288
+ },
1289
+ {
1290
+ "epoch": 0.86,
1291
+ "learning_rate": 1.4268796175575839e-05,
1292
+ "loss": 0.5964,
1293
+ "step": 10550
1294
+ },
1295
+ {
1296
+ "epoch": 0.86,
1297
+ "learning_rate": 1.4241634072142547e-05,
1298
+ "loss": 0.6087,
1299
+ "step": 10600
1300
+ },
1301
+ {
1302
+ "epoch": 0.87,
1303
+ "learning_rate": 1.4214471968709259e-05,
1304
+ "loss": 0.5917,
1305
+ "step": 10650
1306
+ },
1307
+ {
1308
+ "epoch": 0.87,
1309
+ "learning_rate": 1.4187309865275969e-05,
1310
+ "loss": 0.5593,
1311
+ "step": 10700
1312
+ },
1313
+ {
1314
+ "epoch": 0.88,
1315
+ "learning_rate": 1.4160147761842677e-05,
1316
+ "loss": 0.6105,
1317
+ "step": 10750
1318
+ },
1319
+ {
1320
+ "epoch": 0.88,
1321
+ "learning_rate": 1.4132985658409388e-05,
1322
+ "loss": 0.573,
1323
+ "step": 10800
1324
+ },
1325
+ {
1326
+ "epoch": 0.88,
1327
+ "learning_rate": 1.41058235549761e-05,
1328
+ "loss": 0.615,
1329
+ "step": 10850
1330
+ },
1331
+ {
1332
+ "epoch": 0.89,
1333
+ "learning_rate": 1.4078661451542808e-05,
1334
+ "loss": 0.5854,
1335
+ "step": 10900
1336
+ },
1337
+ {
1338
+ "epoch": 0.89,
1339
+ "learning_rate": 1.4051499348109518e-05,
1340
+ "loss": 0.5828,
1341
+ "step": 10950
1342
+ },
1343
+ {
1344
+ "epoch": 0.9,
1345
+ "learning_rate": 1.402433724467623e-05,
1346
+ "loss": 0.5892,
1347
+ "step": 11000
1348
+ },
1349
+ {
1350
+ "epoch": 0.9,
1351
+ "learning_rate": 1.3997175141242937e-05,
1352
+ "loss": 0.59,
1353
+ "step": 11050
1354
+ },
1355
+ {
1356
+ "epoch": 0.9,
1357
+ "learning_rate": 1.3970013037809649e-05,
1358
+ "loss": 0.5789,
1359
+ "step": 11100
1360
+ },
1361
+ {
1362
+ "epoch": 0.91,
1363
+ "learning_rate": 1.3942850934376359e-05,
1364
+ "loss": 0.5549,
1365
+ "step": 11150
1366
+ },
1367
+ {
1368
+ "epoch": 0.91,
1369
+ "learning_rate": 1.391568883094307e-05,
1370
+ "loss": 0.6024,
1371
+ "step": 11200
1372
+ },
1373
+ {
1374
+ "epoch": 0.92,
1375
+ "learning_rate": 1.3888526727509778e-05,
1376
+ "loss": 0.5597,
1377
+ "step": 11250
1378
+ },
1379
+ {
1380
+ "epoch": 0.92,
1381
+ "learning_rate": 1.386136462407649e-05,
1382
+ "loss": 0.5646,
1383
+ "step": 11300
1384
+ },
1385
+ {
1386
+ "epoch": 0.92,
1387
+ "learning_rate": 1.38342025206432e-05,
1388
+ "loss": 0.5711,
1389
+ "step": 11350
1390
+ },
1391
+ {
1392
+ "epoch": 0.93,
1393
+ "learning_rate": 1.380704041720991e-05,
1394
+ "loss": 0.5555,
1395
+ "step": 11400
1396
+ },
1397
+ {
1398
+ "epoch": 0.93,
1399
+ "learning_rate": 1.377987831377662e-05,
1400
+ "loss": 0.5504,
1401
+ "step": 11450
1402
+ },
1403
+ {
1404
+ "epoch": 0.94,
1405
+ "learning_rate": 1.3752716210343331e-05,
1406
+ "loss": 0.5712,
1407
+ "step": 11500
1408
+ },
1409
+ {
1410
+ "epoch": 0.94,
1411
+ "learning_rate": 1.372555410691004e-05,
1412
+ "loss": 0.5768,
1413
+ "step": 11550
1414
+ },
1415
+ {
1416
+ "epoch": 0.95,
1417
+ "learning_rate": 1.369839200347675e-05,
1418
+ "loss": 0.5877,
1419
+ "step": 11600
1420
+ },
1421
+ {
1422
+ "epoch": 0.95,
1423
+ "learning_rate": 1.367122990004346e-05,
1424
+ "loss": 0.5639,
1425
+ "step": 11650
1426
+ },
1427
+ {
1428
+ "epoch": 0.95,
1429
+ "learning_rate": 1.364406779661017e-05,
1430
+ "loss": 0.5803,
1431
+ "step": 11700
1432
+ },
1433
+ {
1434
+ "epoch": 0.96,
1435
+ "learning_rate": 1.361690569317688e-05,
1436
+ "loss": 0.602,
1437
+ "step": 11750
1438
+ },
1439
+ {
1440
+ "epoch": 0.96,
1441
+ "learning_rate": 1.3589743589743592e-05,
1442
+ "loss": 0.5758,
1443
+ "step": 11800
1444
+ },
1445
+ {
1446
+ "epoch": 0.97,
1447
+ "learning_rate": 1.35625814863103e-05,
1448
+ "loss": 0.5657,
1449
+ "step": 11850
1450
+ },
1451
+ {
1452
+ "epoch": 0.97,
1453
+ "learning_rate": 1.3535419382877011e-05,
1454
+ "loss": 0.5857,
1455
+ "step": 11900
1456
+ },
1457
+ {
1458
+ "epoch": 0.97,
1459
+ "learning_rate": 1.3508257279443721e-05,
1460
+ "loss": 0.5922,
1461
+ "step": 11950
1462
+ },
1463
+ {
1464
+ "epoch": 0.98,
1465
+ "learning_rate": 1.3481095176010431e-05,
1466
+ "loss": 0.6179,
1467
+ "step": 12000
1468
+ },
1469
+ {
1470
+ "epoch": 0.98,
1471
+ "learning_rate": 1.3453933072577141e-05,
1472
+ "loss": 0.5464,
1473
+ "step": 12050
1474
+ },
1475
+ {
1476
+ "epoch": 0.99,
1477
+ "learning_rate": 1.3426770969143852e-05,
1478
+ "loss": 0.5578,
1479
+ "step": 12100
1480
+ },
1481
+ {
1482
+ "epoch": 0.99,
1483
+ "learning_rate": 1.3399608865710562e-05,
1484
+ "loss": 0.6018,
1485
+ "step": 12150
1486
+ },
1487
+ {
1488
+ "epoch": 0.99,
1489
+ "learning_rate": 1.3372446762277272e-05,
1490
+ "loss": 0.5907,
1491
+ "step": 12200
1492
+ },
1493
+ {
1494
+ "epoch": 1.0,
1495
+ "learning_rate": 1.3345284658843982e-05,
1496
+ "loss": 0.5403,
1497
+ "step": 12250
1498
+ },
1499
+ {
1500
+ "epoch": 1.0,
1501
+ "learning_rate": 1.3318122555410693e-05,
1502
+ "loss": 0.5376,
1503
+ "step": 12300
1504
+ },
1505
+ {
1506
+ "epoch": 1.01,
1507
+ "learning_rate": 1.3290960451977402e-05,
1508
+ "loss": 0.5479,
1509
+ "step": 12350
1510
+ },
1511
+ {
1512
+ "epoch": 1.01,
1513
+ "learning_rate": 1.3263798348544113e-05,
1514
+ "loss": 0.5121,
1515
+ "step": 12400
1516
+ },
1517
+ {
1518
+ "epoch": 1.01,
1519
+ "learning_rate": 1.3236636245110823e-05,
1520
+ "loss": 0.4965,
1521
+ "step": 12450
1522
+ },
1523
+ {
1524
+ "epoch": 1.02,
1525
+ "learning_rate": 1.3209474141677531e-05,
1526
+ "loss": 0.481,
1527
+ "step": 12500
1528
+ },
1529
+ {
1530
+ "epoch": 1.02,
1531
+ "learning_rate": 1.3182312038244243e-05,
1532
+ "loss": 0.4933,
1533
+ "step": 12550
1534
+ },
1535
+ {
1536
+ "epoch": 1.03,
1537
+ "learning_rate": 1.3155149934810954e-05,
1538
+ "loss": 0.4907,
1539
+ "step": 12600
1540
+ },
1541
+ {
1542
+ "epoch": 1.03,
1543
+ "learning_rate": 1.3127987831377662e-05,
1544
+ "loss": 0.5178,
1545
+ "step": 12650
1546
+ },
1547
+ {
1548
+ "epoch": 1.03,
1549
+ "learning_rate": 1.3100825727944372e-05,
1550
+ "loss": 0.5114,
1551
+ "step": 12700
1552
+ },
1553
+ {
1554
+ "epoch": 1.04,
1555
+ "learning_rate": 1.3073663624511084e-05,
1556
+ "loss": 0.4874,
1557
+ "step": 12750
1558
+ },
1559
+ {
1560
+ "epoch": 1.04,
1561
+ "learning_rate": 1.3046501521077792e-05,
1562
+ "loss": 0.5061,
1563
+ "step": 12800
1564
+ },
1565
+ {
1566
+ "epoch": 1.05,
1567
+ "learning_rate": 1.3019339417644503e-05,
1568
+ "loss": 0.5151,
1569
+ "step": 12850
1570
+ },
1571
+ {
1572
+ "epoch": 1.05,
1573
+ "learning_rate": 1.2992177314211213e-05,
1574
+ "loss": 0.5208,
1575
+ "step": 12900
1576
+ },
1577
+ {
1578
+ "epoch": 1.06,
1579
+ "learning_rate": 1.2965015210777925e-05,
1580
+ "loss": 0.4972,
1581
+ "step": 12950
1582
+ },
1583
+ {
1584
+ "epoch": 1.06,
1585
+ "learning_rate": 1.2937853107344633e-05,
1586
+ "loss": 0.4916,
1587
+ "step": 13000
1588
+ },
1589
+ {
1590
+ "epoch": 1.06,
1591
+ "learning_rate": 1.2910691003911344e-05,
1592
+ "loss": 0.5134,
1593
+ "step": 13050
1594
+ },
1595
+ {
1596
+ "epoch": 1.07,
1597
+ "learning_rate": 1.2883528900478056e-05,
1598
+ "loss": 0.5218,
1599
+ "step": 13100
1600
+ },
1601
+ {
1602
+ "epoch": 1.07,
1603
+ "learning_rate": 1.2856366797044764e-05,
1604
+ "loss": 0.5096,
1605
+ "step": 13150
1606
+ },
1607
+ {
1608
+ "epoch": 1.08,
1609
+ "learning_rate": 1.2829204693611474e-05,
1610
+ "loss": 0.5038,
1611
+ "step": 13200
1612
+ },
1613
+ {
1614
+ "epoch": 1.08,
1615
+ "learning_rate": 1.2802042590178185e-05,
1616
+ "loss": 0.4812,
1617
+ "step": 13250
1618
+ },
1619
+ {
1620
+ "epoch": 1.08,
1621
+ "learning_rate": 1.2774880486744893e-05,
1622
+ "loss": 0.5195,
1623
+ "step": 13300
1624
+ },
1625
+ {
1626
+ "epoch": 1.09,
1627
+ "learning_rate": 1.2747718383311605e-05,
1628
+ "loss": 0.5023,
1629
+ "step": 13350
1630
+ },
1631
+ {
1632
+ "epoch": 1.09,
1633
+ "learning_rate": 1.2720556279878315e-05,
1634
+ "loss": 0.5112,
1635
+ "step": 13400
1636
+ },
1637
+ {
1638
+ "epoch": 1.1,
1639
+ "learning_rate": 1.2693394176445025e-05,
1640
+ "loss": 0.5131,
1641
+ "step": 13450
1642
+ },
1643
+ {
1644
+ "epoch": 1.1,
1645
+ "learning_rate": 1.2666232073011735e-05,
1646
+ "loss": 0.5317,
1647
+ "step": 13500
1648
+ },
1649
+ {
1650
+ "epoch": 1.1,
1651
+ "learning_rate": 1.2639069969578446e-05,
1652
+ "loss": 0.4836,
1653
+ "step": 13550
1654
+ },
1655
+ {
1656
+ "epoch": 1.11,
1657
+ "learning_rate": 1.2611907866145154e-05,
1658
+ "loss": 0.5583,
1659
+ "step": 13600
1660
+ },
1661
+ {
1662
+ "epoch": 1.11,
1663
+ "learning_rate": 1.2584745762711866e-05,
1664
+ "loss": 0.5195,
1665
+ "step": 13650
1666
+ },
1667
+ {
1668
+ "epoch": 1.12,
1669
+ "learning_rate": 1.2557583659278576e-05,
1670
+ "loss": 0.4933,
1671
+ "step": 13700
1672
+ },
1673
+ {
1674
+ "epoch": 1.12,
1675
+ "learning_rate": 1.2530421555845285e-05,
1676
+ "loss": 0.5127,
1677
+ "step": 13750
1678
+ },
1679
+ {
1680
+ "epoch": 1.12,
1681
+ "learning_rate": 1.2503259452411995e-05,
1682
+ "loss": 0.4924,
1683
+ "step": 13800
1684
+ },
1685
+ {
1686
+ "epoch": 1.13,
1687
+ "learning_rate": 1.2476097348978707e-05,
1688
+ "loss": 0.4972,
1689
+ "step": 13850
1690
+ },
1691
+ {
1692
+ "epoch": 1.13,
1693
+ "learning_rate": 1.2448935245545417e-05,
1694
+ "loss": 0.5336,
1695
+ "step": 13900
1696
+ },
1697
+ {
1698
+ "epoch": 1.14,
1699
+ "learning_rate": 1.2421773142112126e-05,
1700
+ "loss": 0.4995,
1701
+ "step": 13950
1702
+ },
1703
+ {
1704
+ "epoch": 1.14,
1705
+ "learning_rate": 1.2394611038678836e-05,
1706
+ "loss": 0.4911,
1707
+ "step": 14000
1708
+ },
1709
+ {
1710
+ "epoch": 1.14,
1711
+ "learning_rate": 1.2367448935245548e-05,
1712
+ "loss": 0.5177,
1713
+ "step": 14050
1714
+ },
1715
+ {
1716
+ "epoch": 1.15,
1717
+ "learning_rate": 1.2340286831812256e-05,
1718
+ "loss": 0.4603,
1719
+ "step": 14100
1720
+ },
1721
+ {
1722
+ "epoch": 1.15,
1723
+ "learning_rate": 1.2313124728378967e-05,
1724
+ "loss": 0.5058,
1725
+ "step": 14150
1726
+ },
1727
+ {
1728
+ "epoch": 1.16,
1729
+ "learning_rate": 1.2285962624945677e-05,
1730
+ "loss": 0.5239,
1731
+ "step": 14200
1732
+ },
1733
+ {
1734
+ "epoch": 1.16,
1735
+ "learning_rate": 1.2258800521512385e-05,
1736
+ "loss": 0.5143,
1737
+ "step": 14250
1738
+ },
1739
+ {
1740
+ "epoch": 1.17,
1741
+ "learning_rate": 1.2231638418079097e-05,
1742
+ "loss": 0.4909,
1743
+ "step": 14300
1744
+ },
1745
+ {
1746
+ "epoch": 1.17,
1747
+ "learning_rate": 1.2204476314645808e-05,
1748
+ "loss": 0.5104,
1749
+ "step": 14350
1750
+ },
1751
+ {
1752
+ "epoch": 1.17,
1753
+ "learning_rate": 1.2177314211212517e-05,
1754
+ "loss": 0.5263,
1755
+ "step": 14400
1756
+ },
1757
+ {
1758
+ "epoch": 1.18,
1759
+ "learning_rate": 1.2150152107779226e-05,
1760
+ "loss": 0.4828,
1761
+ "step": 14450
1762
+ },
1763
+ {
1764
+ "epoch": 1.18,
1765
+ "learning_rate": 1.2122990004345938e-05,
1766
+ "loss": 0.4891,
1767
+ "step": 14500
1768
+ },
1769
+ {
1770
+ "epoch": 1.19,
1771
+ "learning_rate": 1.2095827900912646e-05,
1772
+ "loss": 0.5003,
1773
+ "step": 14550
1774
+ },
1775
+ {
1776
+ "epoch": 1.19,
1777
+ "learning_rate": 1.2068665797479358e-05,
1778
+ "loss": 0.4827,
1779
+ "step": 14600
1780
+ },
1781
+ {
1782
+ "epoch": 1.19,
1783
+ "learning_rate": 1.204150369404607e-05,
1784
+ "loss": 0.5017,
1785
+ "step": 14650
1786
+ },
1787
+ {
1788
+ "epoch": 1.2,
1789
+ "learning_rate": 1.2014341590612777e-05,
1790
+ "loss": 0.525,
1791
+ "step": 14700
1792
+ },
1793
+ {
1794
+ "epoch": 1.2,
1795
+ "learning_rate": 1.1987179487179487e-05,
1796
+ "loss": 0.5077,
1797
+ "step": 14750
1798
+ },
1799
+ {
1800
+ "epoch": 1.21,
1801
+ "learning_rate": 1.1960017383746199e-05,
1802
+ "loss": 0.5062,
1803
+ "step": 14800
1804
+ },
1805
+ {
1806
+ "epoch": 1.21,
1807
+ "learning_rate": 1.193285528031291e-05,
1808
+ "loss": 0.5098,
1809
+ "step": 14850
1810
+ },
1811
+ {
1812
+ "epoch": 1.21,
1813
+ "learning_rate": 1.1905693176879618e-05,
1814
+ "loss": 0.5366,
1815
+ "step": 14900
1816
+ },
1817
+ {
1818
+ "epoch": 1.22,
1819
+ "learning_rate": 1.1878531073446328e-05,
1820
+ "loss": 0.5097,
1821
+ "step": 14950
1822
+ },
1823
+ {
1824
+ "epoch": 1.22,
1825
+ "learning_rate": 1.185136897001304e-05,
1826
+ "loss": 0.4847,
1827
+ "step": 15000
1828
+ },
1829
+ {
1830
+ "epoch": 1.22,
1831
+ "eval_accuracy": 0.7772796739684157,
1832
+ "eval_loss": 0.5816997289657593,
1833
+ "eval_runtime": 16.7546,
1834
+ "eval_samples_per_second": 585.81,
1835
+ "eval_steps_per_second": 36.647,
1836
+ "step": 15000
1837
+ },
1838
+ {
1839
+ "epoch": 1.23,
1840
+ "learning_rate": 1.1824206866579748e-05,
1841
+ "loss": 0.5372,
1842
+ "step": 15050
1843
+ },
1844
+ {
1845
+ "epoch": 1.23,
1846
+ "learning_rate": 1.179704476314646e-05,
1847
+ "loss": 0.4814,
1848
+ "step": 15100
1849
+ },
1850
+ {
1851
+ "epoch": 1.23,
1852
+ "learning_rate": 1.176988265971317e-05,
1853
+ "loss": 0.4823,
1854
+ "step": 15150
1855
+ },
1856
+ {
1857
+ "epoch": 1.24,
1858
+ "learning_rate": 1.1742720556279879e-05,
1859
+ "loss": 0.5288,
1860
+ "step": 15200
1861
+ },
1862
+ {
1863
+ "epoch": 1.24,
1864
+ "learning_rate": 1.1715558452846589e-05,
1865
+ "loss": 0.5211,
1866
+ "step": 15250
1867
+ },
1868
+ {
1869
+ "epoch": 1.25,
1870
+ "learning_rate": 1.16883963494133e-05,
1871
+ "loss": 0.4856,
1872
+ "step": 15300
1873
+ },
1874
+ {
1875
+ "epoch": 1.25,
1876
+ "learning_rate": 1.1661234245980009e-05,
1877
+ "loss": 0.5162,
1878
+ "step": 15350
1879
+ },
1880
+ {
1881
+ "epoch": 1.25,
1882
+ "learning_rate": 1.163407214254672e-05,
1883
+ "loss": 0.5032,
1884
+ "step": 15400
1885
+ },
1886
+ {
1887
+ "epoch": 1.26,
1888
+ "learning_rate": 1.160691003911343e-05,
1889
+ "loss": 0.5181,
1890
+ "step": 15450
1891
+ },
1892
+ {
1893
+ "epoch": 1.26,
1894
+ "learning_rate": 1.157974793568014e-05,
1895
+ "loss": 0.5042,
1896
+ "step": 15500
1897
+ },
1898
+ {
1899
+ "epoch": 1.27,
1900
+ "learning_rate": 1.155258583224685e-05,
1901
+ "loss": 0.4901,
1902
+ "step": 15550
1903
+ },
1904
+ {
1905
+ "epoch": 1.27,
1906
+ "learning_rate": 1.1525423728813561e-05,
1907
+ "loss": 0.5109,
1908
+ "step": 15600
1909
+ },
1910
+ {
1911
+ "epoch": 1.28,
1912
+ "learning_rate": 1.1498261625380271e-05,
1913
+ "loss": 0.5227,
1914
+ "step": 15650
1915
+ },
1916
+ {
1917
+ "epoch": 1.28,
1918
+ "learning_rate": 1.147109952194698e-05,
1919
+ "loss": 0.4899,
1920
+ "step": 15700
1921
+ },
1922
+ {
1923
+ "epoch": 1.28,
1924
+ "learning_rate": 1.144393741851369e-05,
1925
+ "loss": 0.4738,
1926
+ "step": 15750
1927
+ },
1928
+ {
1929
+ "epoch": 1.29,
1930
+ "learning_rate": 1.1416775315080402e-05,
1931
+ "loss": 0.5072,
1932
+ "step": 15800
1933
+ },
1934
+ {
1935
+ "epoch": 1.29,
1936
+ "learning_rate": 1.138961321164711e-05,
1937
+ "loss": 0.507,
1938
+ "step": 15850
1939
+ },
1940
+ {
1941
+ "epoch": 1.3,
1942
+ "learning_rate": 1.1362451108213822e-05,
1943
+ "loss": 0.5224,
1944
+ "step": 15900
1945
+ },
1946
+ {
1947
+ "epoch": 1.3,
1948
+ "learning_rate": 1.1335289004780532e-05,
1949
+ "loss": 0.5032,
1950
+ "step": 15950
1951
+ },
1952
+ {
1953
+ "epoch": 1.3,
1954
+ "learning_rate": 1.130812690134724e-05,
1955
+ "loss": 0.4855,
1956
+ "step": 16000
1957
+ },
1958
+ {
1959
+ "epoch": 1.31,
1960
+ "learning_rate": 1.1280964797913951e-05,
1961
+ "loss": 0.4908,
1962
+ "step": 16050
1963
+ },
1964
+ {
1965
+ "epoch": 1.31,
1966
+ "learning_rate": 1.1253802694480663e-05,
1967
+ "loss": 0.5129,
1968
+ "step": 16100
1969
+ },
1970
+ {
1971
+ "epoch": 1.32,
1972
+ "learning_rate": 1.1226640591047371e-05,
1973
+ "loss": 0.5153,
1974
+ "step": 16150
1975
+ },
1976
+ {
1977
+ "epoch": 1.32,
1978
+ "learning_rate": 1.1199478487614082e-05,
1979
+ "loss": 0.5167,
1980
+ "step": 16200
1981
+ },
1982
+ {
1983
+ "epoch": 1.32,
1984
+ "learning_rate": 1.1172316384180792e-05,
1985
+ "loss": 0.4818,
1986
+ "step": 16250
1987
+ },
1988
+ {
1989
+ "epoch": 1.33,
1990
+ "learning_rate": 1.11451542807475e-05,
1991
+ "loss": 0.5057,
1992
+ "step": 16300
1993
+ },
1994
+ {
1995
+ "epoch": 1.33,
1996
+ "learning_rate": 1.1117992177314212e-05,
1997
+ "loss": 0.4963,
1998
+ "step": 16350
1999
+ },
2000
+ {
2001
+ "epoch": 1.34,
2002
+ "learning_rate": 1.1090830073880923e-05,
2003
+ "loss": 0.4832,
2004
+ "step": 16400
2005
+ },
2006
+ {
2007
+ "epoch": 1.34,
2008
+ "learning_rate": 1.1063667970447632e-05,
2009
+ "loss": 0.4967,
2010
+ "step": 16450
2011
+ },
2012
+ {
2013
+ "epoch": 1.34,
2014
+ "learning_rate": 1.1036505867014341e-05,
2015
+ "loss": 0.4924,
2016
+ "step": 16500
2017
+ },
2018
+ {
2019
+ "epoch": 1.35,
2020
+ "learning_rate": 1.1009343763581053e-05,
2021
+ "loss": 0.4857,
2022
+ "step": 16550
2023
+ },
2024
+ {
2025
+ "epoch": 1.35,
2026
+ "learning_rate": 1.0982181660147765e-05,
2027
+ "loss": 0.4981,
2028
+ "step": 16600
2029
+ },
2030
+ {
2031
+ "epoch": 1.36,
2032
+ "learning_rate": 1.0955019556714473e-05,
2033
+ "loss": 0.484,
2034
+ "step": 16650
2035
+ },
2036
+ {
2037
+ "epoch": 1.36,
2038
+ "learning_rate": 1.0927857453281182e-05,
2039
+ "loss": 0.5214,
2040
+ "step": 16700
2041
+ },
2042
+ {
2043
+ "epoch": 1.36,
2044
+ "learning_rate": 1.0900695349847894e-05,
2045
+ "loss": 0.5174,
2046
+ "step": 16750
2047
+ },
2048
+ {
2049
+ "epoch": 1.37,
2050
+ "learning_rate": 1.0873533246414602e-05,
2051
+ "loss": 0.5203,
2052
+ "step": 16800
2053
+ },
2054
+ {
2055
+ "epoch": 1.37,
2056
+ "learning_rate": 1.0846371142981314e-05,
2057
+ "loss": 0.4865,
2058
+ "step": 16850
2059
+ },
2060
+ {
2061
+ "epoch": 1.38,
2062
+ "learning_rate": 1.0819209039548024e-05,
2063
+ "loss": 0.501,
2064
+ "step": 16900
2065
+ },
2066
+ {
2067
+ "epoch": 1.38,
2068
+ "learning_rate": 1.0792046936114733e-05,
2069
+ "loss": 0.4807,
2070
+ "step": 16950
2071
+ },
2072
+ {
2073
+ "epoch": 1.39,
2074
+ "learning_rate": 1.0764884832681443e-05,
2075
+ "loss": 0.5103,
2076
+ "step": 17000
2077
+ },
2078
+ {
2079
+ "epoch": 1.39,
2080
+ "learning_rate": 1.0737722729248155e-05,
2081
+ "loss": 0.5467,
2082
+ "step": 17050
2083
+ },
2084
+ {
2085
+ "epoch": 1.39,
2086
+ "learning_rate": 1.0710560625814863e-05,
2087
+ "loss": 0.4653,
2088
+ "step": 17100
2089
+ },
2090
+ {
2091
+ "epoch": 1.4,
2092
+ "learning_rate": 1.0683398522381574e-05,
2093
+ "loss": 0.4821,
2094
+ "step": 17150
2095
+ },
2096
+ {
2097
+ "epoch": 1.4,
2098
+ "learning_rate": 1.0656236418948284e-05,
2099
+ "loss": 0.5136,
2100
+ "step": 17200
2101
+ },
2102
+ {
2103
+ "epoch": 1.41,
2104
+ "learning_rate": 1.0629074315514994e-05,
2105
+ "loss": 0.4993,
2106
+ "step": 17250
2107
+ },
2108
+ {
2109
+ "epoch": 1.41,
2110
+ "learning_rate": 1.0601912212081704e-05,
2111
+ "loss": 0.5035,
2112
+ "step": 17300
2113
+ },
2114
+ {
2115
+ "epoch": 1.41,
2116
+ "learning_rate": 1.0574750108648415e-05,
2117
+ "loss": 0.4993,
2118
+ "step": 17350
2119
+ },
2120
+ {
2121
+ "epoch": 1.42,
2122
+ "learning_rate": 1.0547588005215125e-05,
2123
+ "loss": 0.5187,
2124
+ "step": 17400
2125
+ },
2126
+ {
2127
+ "epoch": 1.42,
2128
+ "learning_rate": 1.0520425901781835e-05,
2129
+ "loss": 0.507,
2130
+ "step": 17450
2131
+ },
2132
+ {
2133
+ "epoch": 1.43,
2134
+ "learning_rate": 1.0493263798348545e-05,
2135
+ "loss": 0.4857,
2136
+ "step": 17500
2137
+ },
2138
+ {
2139
+ "epoch": 1.43,
2140
+ "learning_rate": 1.0466101694915256e-05,
2141
+ "loss": 0.5085,
2142
+ "step": 17550
2143
+ },
2144
+ {
2145
+ "epoch": 1.43,
2146
+ "learning_rate": 1.0438939591481965e-05,
2147
+ "loss": 0.5106,
2148
+ "step": 17600
2149
+ },
2150
+ {
2151
+ "epoch": 1.44,
2152
+ "learning_rate": 1.0411777488048676e-05,
2153
+ "loss": 0.5152,
2154
+ "step": 17650
2155
+ },
2156
+ {
2157
+ "epoch": 1.44,
2158
+ "learning_rate": 1.0384615384615386e-05,
2159
+ "loss": 0.5066,
2160
+ "step": 17700
2161
+ },
2162
+ {
2163
+ "epoch": 1.45,
2164
+ "learning_rate": 1.0357453281182096e-05,
2165
+ "loss": 0.5097,
2166
+ "step": 17750
2167
+ },
2168
+ {
2169
+ "epoch": 1.45,
2170
+ "learning_rate": 1.0330291177748806e-05,
2171
+ "loss": 0.5447,
2172
+ "step": 17800
2173
+ },
2174
+ {
2175
+ "epoch": 1.45,
2176
+ "learning_rate": 1.0303129074315517e-05,
2177
+ "loss": 0.5142,
2178
+ "step": 17850
2179
+ },
2180
+ {
2181
+ "epoch": 1.46,
2182
+ "learning_rate": 1.0275966970882225e-05,
2183
+ "loss": 0.5053,
2184
+ "step": 17900
2185
+ },
2186
+ {
2187
+ "epoch": 1.46,
2188
+ "learning_rate": 1.0248804867448937e-05,
2189
+ "loss": 0.511,
2190
+ "step": 17950
2191
+ },
2192
+ {
2193
+ "epoch": 1.47,
2194
+ "learning_rate": 1.0221642764015647e-05,
2195
+ "loss": 0.5134,
2196
+ "step": 18000
2197
+ },
2198
+ {
2199
+ "epoch": 1.47,
2200
+ "learning_rate": 1.0194480660582355e-05,
2201
+ "loss": 0.5304,
2202
+ "step": 18050
2203
+ },
2204
+ {
2205
+ "epoch": 1.47,
2206
+ "learning_rate": 1.0167318557149066e-05,
2207
+ "loss": 0.4506,
2208
+ "step": 18100
2209
+ },
2210
+ {
2211
+ "epoch": 1.48,
2212
+ "learning_rate": 1.0140156453715778e-05,
2213
+ "loss": 0.5318,
2214
+ "step": 18150
2215
+ },
2216
+ {
2217
+ "epoch": 1.48,
2218
+ "learning_rate": 1.0112994350282486e-05,
2219
+ "loss": 0.4923,
2220
+ "step": 18200
2221
+ },
2222
+ {
2223
+ "epoch": 1.49,
2224
+ "learning_rate": 1.0085832246849196e-05,
2225
+ "loss": 0.4903,
2226
+ "step": 18250
2227
+ },
2228
+ {
2229
+ "epoch": 1.49,
2230
+ "learning_rate": 1.0058670143415907e-05,
2231
+ "loss": 0.4626,
2232
+ "step": 18300
2233
+ },
2234
+ {
2235
+ "epoch": 1.5,
2236
+ "learning_rate": 1.0031508039982619e-05,
2237
+ "loss": 0.4965,
2238
+ "step": 18350
2239
+ },
2240
+ {
2241
+ "epoch": 1.5,
2242
+ "learning_rate": 1.0004345936549327e-05,
2243
+ "loss": 0.4839,
2244
+ "step": 18400
2245
+ },
2246
+ {
2247
+ "epoch": 1.5,
2248
+ "learning_rate": 9.977183833116037e-06,
2249
+ "loss": 0.5238,
2250
+ "step": 18450
2251
+ },
2252
+ {
2253
+ "epoch": 1.51,
2254
+ "learning_rate": 9.950021729682747e-06,
2255
+ "loss": 0.5056,
2256
+ "step": 18500
2257
+ },
2258
+ {
2259
+ "epoch": 1.51,
2260
+ "learning_rate": 9.922859626249458e-06,
2261
+ "loss": 0.5162,
2262
+ "step": 18550
2263
+ },
2264
+ {
2265
+ "epoch": 1.52,
2266
+ "learning_rate": 9.895697522816168e-06,
2267
+ "loss": 0.4718,
2268
+ "step": 18600
2269
+ },
2270
+ {
2271
+ "epoch": 1.52,
2272
+ "learning_rate": 9.868535419382878e-06,
2273
+ "loss": 0.5132,
2274
+ "step": 18650
2275
+ },
2276
+ {
2277
+ "epoch": 1.52,
2278
+ "learning_rate": 9.841373315949588e-06,
2279
+ "loss": 0.5185,
2280
+ "step": 18700
2281
+ },
2282
+ {
2283
+ "epoch": 1.53,
2284
+ "learning_rate": 9.814211212516298e-06,
2285
+ "loss": 0.4927,
2286
+ "step": 18750
2287
+ },
2288
+ {
2289
+ "epoch": 1.53,
2290
+ "learning_rate": 9.787049109083007e-06,
2291
+ "loss": 0.4992,
2292
+ "step": 18800
2293
+ },
2294
+ {
2295
+ "epoch": 1.54,
2296
+ "learning_rate": 9.759887005649719e-06,
2297
+ "loss": 0.4462,
2298
+ "step": 18850
2299
+ },
2300
+ {
2301
+ "epoch": 1.54,
2302
+ "learning_rate": 9.732724902216429e-06,
2303
+ "loss": 0.4955,
2304
+ "step": 18900
2305
+ },
2306
+ {
2307
+ "epoch": 1.54,
2308
+ "learning_rate": 9.705562798783139e-06,
2309
+ "loss": 0.5153,
2310
+ "step": 18950
2311
+ },
2312
+ {
2313
+ "epoch": 1.55,
2314
+ "learning_rate": 9.678400695349848e-06,
2315
+ "loss": 0.4776,
2316
+ "step": 19000
2317
+ },
2318
+ {
2319
+ "epoch": 1.55,
2320
+ "learning_rate": 9.651238591916558e-06,
2321
+ "loss": 0.518,
2322
+ "step": 19050
2323
+ },
2324
+ {
2325
+ "epoch": 1.56,
2326
+ "learning_rate": 9.62407648848327e-06,
2327
+ "loss": 0.5094,
2328
+ "step": 19100
2329
+ },
2330
+ {
2331
+ "epoch": 1.56,
2332
+ "learning_rate": 9.59691438504998e-06,
2333
+ "loss": 0.5092,
2334
+ "step": 19150
2335
+ },
2336
+ {
2337
+ "epoch": 1.56,
2338
+ "learning_rate": 9.56975228161669e-06,
2339
+ "loss": 0.5016,
2340
+ "step": 19200
2341
+ },
2342
+ {
2343
+ "epoch": 1.57,
2344
+ "learning_rate": 9.5425901781834e-06,
2345
+ "loss": 0.5357,
2346
+ "step": 19250
2347
+ },
2348
+ {
2349
+ "epoch": 1.57,
2350
+ "learning_rate": 9.515428074750109e-06,
2351
+ "loss": 0.521,
2352
+ "step": 19300
2353
+ },
2354
+ {
2355
+ "epoch": 1.58,
2356
+ "learning_rate": 9.488265971316819e-06,
2357
+ "loss": 0.488,
2358
+ "step": 19350
2359
+ },
2360
+ {
2361
+ "epoch": 1.58,
2362
+ "learning_rate": 9.46110386788353e-06,
2363
+ "loss": 0.4856,
2364
+ "step": 19400
2365
+ },
2366
+ {
2367
+ "epoch": 1.58,
2368
+ "learning_rate": 9.43394176445024e-06,
2369
+ "loss": 0.506,
2370
+ "step": 19450
2371
+ },
2372
+ {
2373
+ "epoch": 1.59,
2374
+ "learning_rate": 9.40677966101695e-06,
2375
+ "loss": 0.5038,
2376
+ "step": 19500
2377
+ },
2378
+ {
2379
+ "epoch": 1.59,
2380
+ "learning_rate": 9.37961755758366e-06,
2381
+ "loss": 0.471,
2382
+ "step": 19550
2383
+ },
2384
+ {
2385
+ "epoch": 1.6,
2386
+ "learning_rate": 9.35245545415037e-06,
2387
+ "loss": 0.4886,
2388
+ "step": 19600
2389
+ },
2390
+ {
2391
+ "epoch": 1.6,
2392
+ "learning_rate": 9.325293350717081e-06,
2393
+ "loss": 0.5131,
2394
+ "step": 19650
2395
+ },
2396
+ {
2397
+ "epoch": 1.61,
2398
+ "learning_rate": 9.298131247283791e-06,
2399
+ "loss": 0.5082,
2400
+ "step": 19700
2401
+ },
2402
+ {
2403
+ "epoch": 1.61,
2404
+ "learning_rate": 9.2709691438505e-06,
2405
+ "loss": 0.5021,
2406
+ "step": 19750
2407
+ },
2408
+ {
2409
+ "epoch": 1.61,
2410
+ "learning_rate": 9.24380704041721e-06,
2411
+ "loss": 0.4695,
2412
+ "step": 19800
2413
+ },
2414
+ {
2415
+ "epoch": 1.62,
2416
+ "learning_rate": 9.21664493698392e-06,
2417
+ "loss": 0.4883,
2418
+ "step": 19850
2419
+ },
2420
+ {
2421
+ "epoch": 1.62,
2422
+ "learning_rate": 9.189482833550632e-06,
2423
+ "loss": 0.4833,
2424
+ "step": 19900
2425
+ },
2426
+ {
2427
+ "epoch": 1.63,
2428
+ "learning_rate": 9.162320730117342e-06,
2429
+ "loss": 0.4642,
2430
+ "step": 19950
2431
+ },
2432
+ {
2433
+ "epoch": 1.63,
2434
+ "learning_rate": 9.13515862668405e-06,
2435
+ "loss": 0.5109,
2436
+ "step": 20000
2437
+ },
2438
+ {
2439
+ "epoch": 1.63,
2440
+ "eval_accuracy": 0.7790117167600611,
2441
+ "eval_loss": 0.5679929852485657,
2442
+ "eval_runtime": 16.5969,
2443
+ "eval_samples_per_second": 591.374,
2444
+ "eval_steps_per_second": 36.995,
2445
+ "step": 20000
2446
+ },
2447
+ {
2448
+ "epoch": 1.63,
2449
+ "learning_rate": 9.107996523250762e-06,
2450
+ "loss": 0.5284,
2451
+ "step": 20050
2452
+ },
2453
+ {
2454
+ "epoch": 1.64,
2455
+ "learning_rate": 9.080834419817471e-06,
2456
+ "loss": 0.4903,
2457
+ "step": 20100
2458
+ },
2459
+ {
2460
+ "epoch": 1.64,
2461
+ "learning_rate": 9.053672316384181e-06,
2462
+ "loss": 0.5407,
2463
+ "step": 20150
2464
+ },
2465
+ {
2466
+ "epoch": 1.65,
2467
+ "learning_rate": 9.026510212950891e-06,
2468
+ "loss": 0.4949,
2469
+ "step": 20200
2470
+ },
2471
+ {
2472
+ "epoch": 1.65,
2473
+ "learning_rate": 8.999348109517601e-06,
2474
+ "loss": 0.5127,
2475
+ "step": 20250
2476
+ },
2477
+ {
2478
+ "epoch": 1.65,
2479
+ "learning_rate": 8.972186006084312e-06,
2480
+ "loss": 0.4523,
2481
+ "step": 20300
2482
+ },
2483
+ {
2484
+ "epoch": 1.66,
2485
+ "learning_rate": 8.945023902651022e-06,
2486
+ "loss": 0.4655,
2487
+ "step": 20350
2488
+ },
2489
+ {
2490
+ "epoch": 1.66,
2491
+ "learning_rate": 8.917861799217732e-06,
2492
+ "loss": 0.5071,
2493
+ "step": 20400
2494
+ },
2495
+ {
2496
+ "epoch": 1.67,
2497
+ "learning_rate": 8.890699695784442e-06,
2498
+ "loss": 0.4697,
2499
+ "step": 20450
2500
+ },
2501
+ {
2502
+ "epoch": 1.67,
2503
+ "learning_rate": 8.863537592351152e-06,
2504
+ "loss": 0.499,
2505
+ "step": 20500
2506
+ },
2507
+ {
2508
+ "epoch": 1.67,
2509
+ "learning_rate": 8.836375488917862e-06,
2510
+ "loss": 0.5156,
2511
+ "step": 20550
2512
+ },
2513
+ {
2514
+ "epoch": 1.68,
2515
+ "learning_rate": 8.809213385484573e-06,
2516
+ "loss": 0.5244,
2517
+ "step": 20600
2518
+ },
2519
+ {
2520
+ "epoch": 1.68,
2521
+ "learning_rate": 8.782051282051283e-06,
2522
+ "loss": 0.5259,
2523
+ "step": 20650
2524
+ },
2525
+ {
2526
+ "epoch": 1.69,
2527
+ "learning_rate": 8.754889178617993e-06,
2528
+ "loss": 0.4975,
2529
+ "step": 20700
2530
+ },
2531
+ {
2532
+ "epoch": 1.69,
2533
+ "learning_rate": 8.727727075184703e-06,
2534
+ "loss": 0.4861,
2535
+ "step": 20750
2536
+ },
2537
+ {
2538
+ "epoch": 1.69,
2539
+ "learning_rate": 8.700564971751413e-06,
2540
+ "loss": 0.5068,
2541
+ "step": 20800
2542
+ },
2543
+ {
2544
+ "epoch": 1.7,
2545
+ "learning_rate": 8.673402868318124e-06,
2546
+ "loss": 0.477,
2547
+ "step": 20850
2548
+ },
2549
+ {
2550
+ "epoch": 1.7,
2551
+ "learning_rate": 8.646240764884834e-06,
2552
+ "loss": 0.4909,
2553
+ "step": 20900
2554
+ },
2555
+ {
2556
+ "epoch": 1.71,
2557
+ "learning_rate": 8.619078661451544e-06,
2558
+ "loss": 0.5286,
2559
+ "step": 20950
2560
+ },
2561
+ {
2562
+ "epoch": 1.71,
2563
+ "learning_rate": 8.591916558018254e-06,
2564
+ "loss": 0.5003,
2565
+ "step": 21000
2566
+ },
2567
+ {
2568
+ "epoch": 1.72,
2569
+ "learning_rate": 8.564754454584963e-06,
2570
+ "loss": 0.4592,
2571
+ "step": 21050
2572
+ },
2573
+ {
2574
+ "epoch": 1.72,
2575
+ "learning_rate": 8.537592351151673e-06,
2576
+ "loss": 0.4945,
2577
+ "step": 21100
2578
+ },
2579
+ {
2580
+ "epoch": 1.72,
2581
+ "learning_rate": 8.510430247718385e-06,
2582
+ "loss": 0.5153,
2583
+ "step": 21150
2584
+ },
2585
+ {
2586
+ "epoch": 1.73,
2587
+ "learning_rate": 8.483268144285095e-06,
2588
+ "loss": 0.488,
2589
+ "step": 21200
2590
+ },
2591
+ {
2592
+ "epoch": 1.73,
2593
+ "learning_rate": 8.456106040851804e-06,
2594
+ "loss": 0.5066,
2595
+ "step": 21250
2596
+ },
2597
+ {
2598
+ "epoch": 1.74,
2599
+ "learning_rate": 8.428943937418514e-06,
2600
+ "loss": 0.506,
2601
+ "step": 21300
2602
+ },
2603
+ {
2604
+ "epoch": 1.74,
2605
+ "learning_rate": 8.401781833985224e-06,
2606
+ "loss": 0.5118,
2607
+ "step": 21350
2608
+ },
2609
+ {
2610
+ "epoch": 1.74,
2611
+ "learning_rate": 8.374619730551936e-06,
2612
+ "loss": 0.4863,
2613
+ "step": 21400
2614
+ },
2615
+ {
2616
+ "epoch": 1.75,
2617
+ "learning_rate": 8.347457627118645e-06,
2618
+ "loss": 0.4974,
2619
+ "step": 21450
2620
+ },
2621
+ {
2622
+ "epoch": 1.75,
2623
+ "learning_rate": 8.320295523685355e-06,
2624
+ "loss": 0.4872,
2625
+ "step": 21500
2626
+ },
2627
+ {
2628
+ "epoch": 1.76,
2629
+ "learning_rate": 8.293133420252065e-06,
2630
+ "loss": 0.4901,
2631
+ "step": 21550
2632
+ },
2633
+ {
2634
+ "epoch": 1.76,
2635
+ "learning_rate": 8.265971316818775e-06,
2636
+ "loss": 0.5113,
2637
+ "step": 21600
2638
+ },
2639
+ {
2640
+ "epoch": 1.76,
2641
+ "learning_rate": 8.238809213385486e-06,
2642
+ "loss": 0.4773,
2643
+ "step": 21650
2644
+ },
2645
+ {
2646
+ "epoch": 1.77,
2647
+ "learning_rate": 8.211647109952196e-06,
2648
+ "loss": 0.4777,
2649
+ "step": 21700
2650
+ },
2651
+ {
2652
+ "epoch": 1.77,
2653
+ "learning_rate": 8.184485006518904e-06,
2654
+ "loss": 0.5108,
2655
+ "step": 21750
2656
+ },
2657
+ {
2658
+ "epoch": 1.78,
2659
+ "learning_rate": 8.157322903085616e-06,
2660
+ "loss": 0.4692,
2661
+ "step": 21800
2662
+ },
2663
+ {
2664
+ "epoch": 1.78,
2665
+ "learning_rate": 8.130160799652326e-06,
2666
+ "loss": 0.5289,
2667
+ "step": 21850
2668
+ },
2669
+ {
2670
+ "epoch": 1.78,
2671
+ "learning_rate": 8.102998696219036e-06,
2672
+ "loss": 0.5053,
2673
+ "step": 21900
2674
+ },
2675
+ {
2676
+ "epoch": 1.79,
2677
+ "learning_rate": 8.075836592785745e-06,
2678
+ "loss": 0.4812,
2679
+ "step": 21950
2680
+ },
2681
+ {
2682
+ "epoch": 1.79,
2683
+ "learning_rate": 8.048674489352455e-06,
2684
+ "loss": 0.4772,
2685
+ "step": 22000
2686
+ },
2687
+ {
2688
+ "epoch": 1.8,
2689
+ "learning_rate": 8.021512385919165e-06,
2690
+ "loss": 0.5205,
2691
+ "step": 22050
2692
+ },
2693
+ {
2694
+ "epoch": 1.8,
2695
+ "learning_rate": 7.994350282485877e-06,
2696
+ "loss": 0.4581,
2697
+ "step": 22100
2698
+ },
2699
+ {
2700
+ "epoch": 1.8,
2701
+ "learning_rate": 7.967188179052586e-06,
2702
+ "loss": 0.4899,
2703
+ "step": 22150
2704
+ },
2705
+ {
2706
+ "epoch": 1.81,
2707
+ "learning_rate": 7.940026075619296e-06,
2708
+ "loss": 0.5192,
2709
+ "step": 22200
2710
+ },
2711
+ {
2712
+ "epoch": 1.81,
2713
+ "learning_rate": 7.912863972186006e-06,
2714
+ "loss": 0.4652,
2715
+ "step": 22250
2716
+ },
2717
+ {
2718
+ "epoch": 1.82,
2719
+ "learning_rate": 7.885701868752716e-06,
2720
+ "loss": 0.4865,
2721
+ "step": 22300
2722
+ },
2723
+ {
2724
+ "epoch": 1.82,
2725
+ "learning_rate": 7.858539765319428e-06,
2726
+ "loss": 0.5024,
2727
+ "step": 22350
2728
+ },
2729
+ {
2730
+ "epoch": 1.83,
2731
+ "learning_rate": 7.831377661886137e-06,
2732
+ "loss": 0.5156,
2733
+ "step": 22400
2734
+ },
2735
+ {
2736
+ "epoch": 1.83,
2737
+ "learning_rate": 7.804215558452847e-06,
2738
+ "loss": 0.5068,
2739
+ "step": 22450
2740
+ },
2741
+ {
2742
+ "epoch": 1.83,
2743
+ "learning_rate": 7.777053455019557e-06,
2744
+ "loss": 0.4798,
2745
+ "step": 22500
2746
+ },
2747
+ {
2748
+ "epoch": 1.84,
2749
+ "learning_rate": 7.749891351586267e-06,
2750
+ "loss": 0.4908,
2751
+ "step": 22550
2752
+ },
2753
+ {
2754
+ "epoch": 1.84,
2755
+ "learning_rate": 7.722729248152978e-06,
2756
+ "loss": 0.4852,
2757
+ "step": 22600
2758
+ },
2759
+ {
2760
+ "epoch": 1.85,
2761
+ "learning_rate": 7.695567144719688e-06,
2762
+ "loss": 0.5014,
2763
+ "step": 22650
2764
+ },
2765
+ {
2766
+ "epoch": 1.85,
2767
+ "learning_rate": 7.668405041286398e-06,
2768
+ "loss": 0.4777,
2769
+ "step": 22700
2770
+ },
2771
+ {
2772
+ "epoch": 1.85,
2773
+ "learning_rate": 7.641242937853108e-06,
2774
+ "loss": 0.4924,
2775
+ "step": 22750
2776
+ },
2777
+ {
2778
+ "epoch": 1.86,
2779
+ "learning_rate": 7.614080834419818e-06,
2780
+ "loss": 0.4686,
2781
+ "step": 22800
2782
+ },
2783
+ {
2784
+ "epoch": 1.86,
2785
+ "learning_rate": 7.5869187309865275e-06,
2786
+ "loss": 0.4535,
2787
+ "step": 22850
2788
+ },
2789
+ {
2790
+ "epoch": 1.87,
2791
+ "learning_rate": 7.559756627553238e-06,
2792
+ "loss": 0.488,
2793
+ "step": 22900
2794
+ },
2795
+ {
2796
+ "epoch": 1.87,
2797
+ "learning_rate": 7.532594524119948e-06,
2798
+ "loss": 0.4881,
2799
+ "step": 22950
2800
+ },
2801
+ {
2802
+ "epoch": 1.87,
2803
+ "learning_rate": 7.505432420686659e-06,
2804
+ "loss": 0.4768,
2805
+ "step": 23000
2806
+ },
2807
+ {
2808
+ "epoch": 1.88,
2809
+ "learning_rate": 7.4782703172533686e-06,
2810
+ "loss": 0.4947,
2811
+ "step": 23050
2812
+ },
2813
+ {
2814
+ "epoch": 1.88,
2815
+ "learning_rate": 7.451108213820078e-06,
2816
+ "loss": 0.4965,
2817
+ "step": 23100
2818
+ },
2819
+ {
2820
+ "epoch": 1.89,
2821
+ "learning_rate": 7.423946110386789e-06,
2822
+ "loss": 0.5233,
2823
+ "step": 23150
2824
+ },
2825
+ {
2826
+ "epoch": 1.89,
2827
+ "learning_rate": 7.396784006953499e-06,
2828
+ "loss": 0.4837,
2829
+ "step": 23200
2830
+ },
2831
+ {
2832
+ "epoch": 1.89,
2833
+ "learning_rate": 7.369621903520209e-06,
2834
+ "loss": 0.4922,
2835
+ "step": 23250
2836
+ },
2837
+ {
2838
+ "epoch": 1.9,
2839
+ "learning_rate": 7.3424598000869194e-06,
2840
+ "loss": 0.5411,
2841
+ "step": 23300
2842
+ },
2843
+ {
2844
+ "epoch": 1.9,
2845
+ "learning_rate": 7.315297696653629e-06,
2846
+ "loss": 0.5368,
2847
+ "step": 23350
2848
+ },
2849
+ {
2850
+ "epoch": 1.91,
2851
+ "learning_rate": 7.288135593220339e-06,
2852
+ "loss": 0.479,
2853
+ "step": 23400
2854
+ },
2855
+ {
2856
+ "epoch": 1.91,
2857
+ "learning_rate": 7.26097348978705e-06,
2858
+ "loss": 0.438,
2859
+ "step": 23450
2860
+ },
2861
+ {
2862
+ "epoch": 1.91,
2863
+ "learning_rate": 7.23381138635376e-06,
2864
+ "loss": 0.4863,
2865
+ "step": 23500
2866
+ },
2867
+ {
2868
+ "epoch": 1.92,
2869
+ "learning_rate": 7.20664928292047e-06,
2870
+ "loss": 0.4888,
2871
+ "step": 23550
2872
+ },
2873
+ {
2874
+ "epoch": 1.92,
2875
+ "learning_rate": 7.17948717948718e-06,
2876
+ "loss": 0.4937,
2877
+ "step": 23600
2878
+ },
2879
+ {
2880
+ "epoch": 1.93,
2881
+ "learning_rate": 7.15232507605389e-06,
2882
+ "loss": 0.4749,
2883
+ "step": 23650
2884
+ },
2885
+ {
2886
+ "epoch": 1.93,
2887
+ "learning_rate": 7.125162972620601e-06,
2888
+ "loss": 0.4869,
2889
+ "step": 23700
2890
+ },
2891
+ {
2892
+ "epoch": 1.94,
2893
+ "learning_rate": 7.0980008691873105e-06,
2894
+ "loss": 0.4876,
2895
+ "step": 23750
2896
+ },
2897
+ {
2898
+ "epoch": 1.94,
2899
+ "learning_rate": 7.07083876575402e-06,
2900
+ "loss": 0.4814,
2901
+ "step": 23800
2902
+ },
2903
+ {
2904
+ "epoch": 1.94,
2905
+ "learning_rate": 7.043676662320731e-06,
2906
+ "loss": 0.4893,
2907
+ "step": 23850
2908
+ },
2909
+ {
2910
+ "epoch": 1.95,
2911
+ "learning_rate": 7.016514558887441e-06,
2912
+ "loss": 0.4679,
2913
+ "step": 23900
2914
+ },
2915
+ {
2916
+ "epoch": 1.95,
2917
+ "learning_rate": 6.9893524554541515e-06,
2918
+ "loss": 0.46,
2919
+ "step": 23950
2920
+ },
2921
+ {
2922
+ "epoch": 1.96,
2923
+ "learning_rate": 6.962190352020861e-06,
2924
+ "loss": 0.4786,
2925
+ "step": 24000
2926
+ },
2927
+ {
2928
+ "epoch": 1.96,
2929
+ "learning_rate": 6.935028248587571e-06,
2930
+ "loss": 0.5118,
2931
+ "step": 24050
2932
+ },
2933
+ {
2934
+ "epoch": 1.96,
2935
+ "learning_rate": 6.907866145154282e-06,
2936
+ "loss": 0.4952,
2937
+ "step": 24100
2938
+ },
2939
+ {
2940
+ "epoch": 1.97,
2941
+ "learning_rate": 6.880704041720992e-06,
2942
+ "loss": 0.4985,
2943
+ "step": 24150
2944
+ },
2945
+ {
2946
+ "epoch": 1.97,
2947
+ "learning_rate": 6.8535419382877015e-06,
2948
+ "loss": 0.49,
2949
+ "step": 24200
2950
+ },
2951
+ {
2952
+ "epoch": 1.98,
2953
+ "learning_rate": 6.826379834854412e-06,
2954
+ "loss": 0.4873,
2955
+ "step": 24250
2956
+ },
2957
+ {
2958
+ "epoch": 1.98,
2959
+ "learning_rate": 6.799217731421122e-06,
2960
+ "loss": 0.4672,
2961
+ "step": 24300
2962
+ },
2963
+ {
2964
+ "epoch": 1.98,
2965
+ "learning_rate": 6.772055627987833e-06,
2966
+ "loss": 0.4655,
2967
+ "step": 24350
2968
+ },
2969
+ {
2970
+ "epoch": 1.99,
2971
+ "learning_rate": 6.7448935245545425e-06,
2972
+ "loss": 0.482,
2973
+ "step": 24400
2974
+ },
2975
+ {
2976
+ "epoch": 1.99,
2977
+ "learning_rate": 6.7177314211212515e-06,
2978
+ "loss": 0.4529,
2979
+ "step": 24450
2980
+ },
2981
+ {
2982
+ "epoch": 2.0,
2983
+ "learning_rate": 6.690569317687963e-06,
2984
+ "loss": 0.5119,
2985
+ "step": 24500
2986
+ },
2987
+ {
2988
+ "epoch": 2.0,
2989
+ "learning_rate": 6.663407214254672e-06,
2990
+ "loss": 0.4338,
2991
+ "step": 24550
2992
+ },
2993
+ {
2994
+ "epoch": 2.0,
2995
+ "learning_rate": 6.636245110821382e-06,
2996
+ "loss": 0.4019,
2997
+ "step": 24600
2998
+ },
2999
+ {
3000
+ "epoch": 2.01,
3001
+ "learning_rate": 6.6090830073880925e-06,
3002
+ "loss": 0.3948,
3003
+ "step": 24650
3004
+ },
3005
+ {
3006
+ "epoch": 2.01,
3007
+ "learning_rate": 6.581920903954802e-06,
3008
+ "loss": 0.395,
3009
+ "step": 24700
3010
+ },
3011
+ {
3012
+ "epoch": 2.02,
3013
+ "learning_rate": 6.554758800521513e-06,
3014
+ "loss": 0.4065,
3015
+ "step": 24750
3016
+ },
3017
+ {
3018
+ "epoch": 2.02,
3019
+ "learning_rate": 6.527596697088223e-06,
3020
+ "loss": 0.3928,
3021
+ "step": 24800
3022
+ },
3023
+ {
3024
+ "epoch": 2.02,
3025
+ "learning_rate": 6.500434593654933e-06,
3026
+ "loss": 0.3977,
3027
+ "step": 24850
3028
+ },
3029
+ {
3030
+ "epoch": 2.03,
3031
+ "learning_rate": 6.473272490221643e-06,
3032
+ "loss": 0.3697,
3033
+ "step": 24900
3034
+ },
3035
+ {
3036
+ "epoch": 2.03,
3037
+ "learning_rate": 6.446110386788353e-06,
3038
+ "loss": 0.4261,
3039
+ "step": 24950
3040
+ },
3041
+ {
3042
+ "epoch": 2.04,
3043
+ "learning_rate": 6.418948283355063e-06,
3044
+ "loss": 0.3754,
3045
+ "step": 25000
3046
+ },
3047
+ {
3048
+ "epoch": 2.04,
3049
+ "eval_accuracy": 0.7889964340295467,
3050
+ "eval_loss": 0.5795576572418213,
3051
+ "eval_runtime": 16.708,
3052
+ "eval_samples_per_second": 587.442,
3053
+ "eval_steps_per_second": 36.749,
3054
+ "step": 25000
3055
+ },
3056
+ {
3057
+ "epoch": 2.04,
3058
+ "learning_rate": 6.391786179921774e-06,
3059
+ "loss": 0.4236,
3060
+ "step": 25050
3061
+ },
3062
+ {
3063
+ "epoch": 2.05,
3064
+ "learning_rate": 6.364624076488484e-06,
3065
+ "loss": 0.4392,
3066
+ "step": 25100
3067
+ },
3068
+ {
3069
+ "epoch": 2.05,
3070
+ "learning_rate": 6.337461973055193e-06,
3071
+ "loss": 0.3784,
3072
+ "step": 25150
3073
+ },
3074
+ {
3075
+ "epoch": 2.05,
3076
+ "learning_rate": 6.310299869621904e-06,
3077
+ "loss": 0.3771,
3078
+ "step": 25200
3079
+ },
3080
+ {
3081
+ "epoch": 2.06,
3082
+ "learning_rate": 6.283137766188614e-06,
3083
+ "loss": 0.4049,
3084
+ "step": 25250
3085
+ },
3086
+ {
3087
+ "epoch": 2.06,
3088
+ "learning_rate": 6.255975662755325e-06,
3089
+ "loss": 0.4441,
3090
+ "step": 25300
3091
+ },
3092
+ {
3093
+ "epoch": 2.07,
3094
+ "learning_rate": 6.2288135593220344e-06,
3095
+ "loss": 0.3952,
3096
+ "step": 25350
3097
+ },
3098
+ {
3099
+ "epoch": 2.07,
3100
+ "learning_rate": 6.201651455888744e-06,
3101
+ "loss": 0.4196,
3102
+ "step": 25400
3103
+ },
3104
+ {
3105
+ "epoch": 2.07,
3106
+ "learning_rate": 6.174489352455455e-06,
3107
+ "loss": 0.3848,
3108
+ "step": 25450
3109
+ },
3110
+ {
3111
+ "epoch": 2.08,
3112
+ "learning_rate": 6.147327249022165e-06,
3113
+ "loss": 0.396,
3114
+ "step": 25500
3115
+ },
3116
+ {
3117
+ "epoch": 2.08,
3118
+ "learning_rate": 6.120165145588875e-06,
3119
+ "loss": 0.4099,
3120
+ "step": 25550
3121
+ },
3122
+ {
3123
+ "epoch": 2.09,
3124
+ "learning_rate": 6.093003042155585e-06,
3125
+ "loss": 0.3977,
3126
+ "step": 25600
3127
+ },
3128
+ {
3129
+ "epoch": 2.09,
3130
+ "learning_rate": 6.065840938722295e-06,
3131
+ "loss": 0.4118,
3132
+ "step": 25650
3133
+ },
3134
+ {
3135
+ "epoch": 2.09,
3136
+ "learning_rate": 6.038678835289006e-06,
3137
+ "loss": 0.378,
3138
+ "step": 25700
3139
+ },
3140
+ {
3141
+ "epoch": 2.1,
3142
+ "learning_rate": 6.011516731855716e-06,
3143
+ "loss": 0.3767,
3144
+ "step": 25750
3145
+ },
3146
+ {
3147
+ "epoch": 2.1,
3148
+ "learning_rate": 5.9843546284224255e-06,
3149
+ "loss": 0.4023,
3150
+ "step": 25800
3151
+ },
3152
+ {
3153
+ "epoch": 2.11,
3154
+ "learning_rate": 5.957192524989136e-06,
3155
+ "loss": 0.437,
3156
+ "step": 25850
3157
+ },
3158
+ {
3159
+ "epoch": 2.11,
3160
+ "learning_rate": 5.930030421555846e-06,
3161
+ "loss": 0.4221,
3162
+ "step": 25900
3163
+ },
3164
+ {
3165
+ "epoch": 2.11,
3166
+ "learning_rate": 5.902868318122556e-06,
3167
+ "loss": 0.4053,
3168
+ "step": 25950
3169
+ },
3170
+ {
3171
+ "epoch": 2.12,
3172
+ "learning_rate": 5.8757062146892665e-06,
3173
+ "loss": 0.3794,
3174
+ "step": 26000
3175
+ },
3176
+ {
3177
+ "epoch": 2.12,
3178
+ "learning_rate": 5.848544111255976e-06,
3179
+ "loss": 0.3832,
3180
+ "step": 26050
3181
+ },
3182
+ {
3183
+ "epoch": 2.13,
3184
+ "learning_rate": 5.821382007822687e-06,
3185
+ "loss": 0.373,
3186
+ "step": 26100
3187
+ },
3188
+ {
3189
+ "epoch": 2.13,
3190
+ "learning_rate": 5.794219904389397e-06,
3191
+ "loss": 0.3986,
3192
+ "step": 26150
3193
+ },
3194
+ {
3195
+ "epoch": 2.13,
3196
+ "learning_rate": 5.767057800956106e-06,
3197
+ "loss": 0.4046,
3198
+ "step": 26200
3199
+ },
3200
+ {
3201
+ "epoch": 2.14,
3202
+ "learning_rate": 5.739895697522817e-06,
3203
+ "loss": 0.404,
3204
+ "step": 26250
3205
+ },
3206
+ {
3207
+ "epoch": 2.14,
3208
+ "learning_rate": 5.712733594089526e-06,
3209
+ "loss": 0.3856,
3210
+ "step": 26300
3211
+ },
3212
+ {
3213
+ "epoch": 2.15,
3214
+ "learning_rate": 5.685571490656236e-06,
3215
+ "loss": 0.3753,
3216
+ "step": 26350
3217
+ },
3218
+ {
3219
+ "epoch": 2.15,
3220
+ "learning_rate": 5.658409387222948e-06,
3221
+ "loss": 0.4296,
3222
+ "step": 26400
3223
+ },
3224
+ {
3225
+ "epoch": 2.16,
3226
+ "learning_rate": 5.631247283789657e-06,
3227
+ "loss": 0.3694,
3228
+ "step": 26450
3229
+ },
3230
+ {
3231
+ "epoch": 2.16,
3232
+ "learning_rate": 5.6040851803563665e-06,
3233
+ "loss": 0.4106,
3234
+ "step": 26500
3235
+ },
3236
+ {
3237
+ "epoch": 2.16,
3238
+ "learning_rate": 5.576923076923077e-06,
3239
+ "loss": 0.3862,
3240
+ "step": 26550
3241
+ },
3242
+ {
3243
+ "epoch": 2.17,
3244
+ "learning_rate": 5.549760973489787e-06,
3245
+ "loss": 0.4276,
3246
+ "step": 26600
3247
+ },
3248
+ {
3249
+ "epoch": 2.17,
3250
+ "learning_rate": 5.522598870056498e-06,
3251
+ "loss": 0.4068,
3252
+ "step": 26650
3253
+ },
3254
+ {
3255
+ "epoch": 2.18,
3256
+ "learning_rate": 5.4954367666232076e-06,
3257
+ "loss": 0.3879,
3258
+ "step": 26700
3259
+ },
3260
+ {
3261
+ "epoch": 2.18,
3262
+ "learning_rate": 5.468274663189917e-06,
3263
+ "loss": 0.422,
3264
+ "step": 26750
3265
+ },
3266
+ {
3267
+ "epoch": 2.18,
3268
+ "learning_rate": 5.441112559756628e-06,
3269
+ "loss": 0.3924,
3270
+ "step": 26800
3271
+ },
3272
+ {
3273
+ "epoch": 2.19,
3274
+ "learning_rate": 5.413950456323338e-06,
3275
+ "loss": 0.4107,
3276
+ "step": 26850
3277
+ },
3278
+ {
3279
+ "epoch": 2.19,
3280
+ "learning_rate": 5.386788352890048e-06,
3281
+ "loss": 0.4056,
3282
+ "step": 26900
3283
+ },
3284
+ {
3285
+ "epoch": 2.2,
3286
+ "learning_rate": 5.359626249456758e-06,
3287
+ "loss": 0.3873,
3288
+ "step": 26950
3289
+ },
3290
+ {
3291
+ "epoch": 2.2,
3292
+ "learning_rate": 5.332464146023468e-06,
3293
+ "loss": 0.409,
3294
+ "step": 27000
3295
+ },
3296
+ {
3297
+ "epoch": 2.2,
3298
+ "learning_rate": 5.305302042590179e-06,
3299
+ "loss": 0.4301,
3300
+ "step": 27050
3301
+ },
3302
+ {
3303
+ "epoch": 2.21,
3304
+ "learning_rate": 5.278139939156889e-06,
3305
+ "loss": 0.409,
3306
+ "step": 27100
3307
+ },
3308
+ {
3309
+ "epoch": 2.21,
3310
+ "learning_rate": 5.250977835723599e-06,
3311
+ "loss": 0.4243,
3312
+ "step": 27150
3313
+ },
3314
+ {
3315
+ "epoch": 2.22,
3316
+ "learning_rate": 5.223815732290309e-06,
3317
+ "loss": 0.4252,
3318
+ "step": 27200
3319
+ },
3320
+ {
3321
+ "epoch": 2.22,
3322
+ "learning_rate": 5.196653628857019e-06,
3323
+ "loss": 0.405,
3324
+ "step": 27250
3325
+ },
3326
+ {
3327
+ "epoch": 2.22,
3328
+ "learning_rate": 5.169491525423729e-06,
3329
+ "loss": 0.374,
3330
+ "step": 27300
3331
+ },
3332
+ {
3333
+ "epoch": 2.23,
3334
+ "learning_rate": 5.14232942199044e-06,
3335
+ "loss": 0.4005,
3336
+ "step": 27350
3337
+ },
3338
+ {
3339
+ "epoch": 2.23,
3340
+ "learning_rate": 5.1151673185571495e-06,
3341
+ "loss": 0.4009,
3342
+ "step": 27400
3343
+ },
3344
+ {
3345
+ "epoch": 2.24,
3346
+ "learning_rate": 5.08800521512386e-06,
3347
+ "loss": 0.413,
3348
+ "step": 27450
3349
+ },
3350
+ {
3351
+ "epoch": 2.24,
3352
+ "learning_rate": 5.06084311169057e-06,
3353
+ "loss": 0.4043,
3354
+ "step": 27500
3355
+ },
3356
+ {
3357
+ "epoch": 2.24,
3358
+ "learning_rate": 5.03368100825728e-06,
3359
+ "loss": 0.3998,
3360
+ "step": 27550
3361
+ },
3362
+ {
3363
+ "epoch": 2.25,
3364
+ "learning_rate": 5.0065189048239905e-06,
3365
+ "loss": 0.441,
3366
+ "step": 27600
3367
+ },
3368
+ {
3369
+ "epoch": 2.25,
3370
+ "learning_rate": 4.9793568013907e-06,
3371
+ "loss": 0.4,
3372
+ "step": 27650
3373
+ },
3374
+ {
3375
+ "epoch": 2.26,
3376
+ "learning_rate": 4.95219469795741e-06,
3377
+ "loss": 0.3921,
3378
+ "step": 27700
3379
+ },
3380
+ {
3381
+ "epoch": 2.26,
3382
+ "learning_rate": 4.92503259452412e-06,
3383
+ "loss": 0.3924,
3384
+ "step": 27750
3385
+ },
3386
+ {
3387
+ "epoch": 2.27,
3388
+ "learning_rate": 4.897870491090831e-06,
3389
+ "loss": 0.4186,
3390
+ "step": 27800
3391
+ },
3392
+ {
3393
+ "epoch": 2.27,
3394
+ "learning_rate": 4.8707083876575405e-06,
3395
+ "loss": 0.3972,
3396
+ "step": 27850
3397
+ },
3398
+ {
3399
+ "epoch": 2.27,
3400
+ "learning_rate": 4.843546284224251e-06,
3401
+ "loss": 0.425,
3402
+ "step": 27900
3403
+ },
3404
+ {
3405
+ "epoch": 2.28,
3406
+ "learning_rate": 4.816384180790961e-06,
3407
+ "loss": 0.4067,
3408
+ "step": 27950
3409
+ },
3410
+ {
3411
+ "epoch": 2.28,
3412
+ "learning_rate": 4.789222077357671e-06,
3413
+ "loss": 0.4474,
3414
+ "step": 28000
3415
+ },
3416
+ {
3417
+ "epoch": 2.29,
3418
+ "learning_rate": 4.7620599739243815e-06,
3419
+ "loss": 0.4214,
3420
+ "step": 28050
3421
+ },
3422
+ {
3423
+ "epoch": 2.29,
3424
+ "learning_rate": 4.734897870491091e-06,
3425
+ "loss": 0.3861,
3426
+ "step": 28100
3427
+ },
3428
+ {
3429
+ "epoch": 2.29,
3430
+ "learning_rate": 4.707735767057801e-06,
3431
+ "loss": 0.4119,
3432
+ "step": 28150
3433
+ },
3434
+ {
3435
+ "epoch": 2.3,
3436
+ "learning_rate": 4.680573663624511e-06,
3437
+ "loss": 0.4046,
3438
+ "step": 28200
3439
+ },
3440
+ {
3441
+ "epoch": 2.3,
3442
+ "learning_rate": 4.653411560191222e-06,
3443
+ "loss": 0.3846,
3444
+ "step": 28250
3445
+ },
3446
+ {
3447
+ "epoch": 2.31,
3448
+ "learning_rate": 4.6262494567579315e-06,
3449
+ "loss": 0.4105,
3450
+ "step": 28300
3451
+ },
3452
+ {
3453
+ "epoch": 2.31,
3454
+ "learning_rate": 4.599087353324641e-06,
3455
+ "loss": 0.4202,
3456
+ "step": 28350
3457
+ },
3458
+ {
3459
+ "epoch": 2.31,
3460
+ "learning_rate": 4.571925249891352e-06,
3461
+ "loss": 0.3995,
3462
+ "step": 28400
3463
+ },
3464
+ {
3465
+ "epoch": 2.32,
3466
+ "learning_rate": 4.544763146458062e-06,
3467
+ "loss": 0.4221,
3468
+ "step": 28450
3469
+ },
3470
+ {
3471
+ "epoch": 2.32,
3472
+ "learning_rate": 4.5176010430247726e-06,
3473
+ "loss": 0.4233,
3474
+ "step": 28500
3475
+ },
3476
+ {
3477
+ "epoch": 2.33,
3478
+ "learning_rate": 4.490438939591482e-06,
3479
+ "loss": 0.4016,
3480
+ "step": 28550
3481
+ },
3482
+ {
3483
+ "epoch": 2.33,
3484
+ "learning_rate": 4.463276836158192e-06,
3485
+ "loss": 0.3899,
3486
+ "step": 28600
3487
+ },
3488
+ {
3489
+ "epoch": 2.33,
3490
+ "learning_rate": 4.436114732724903e-06,
3491
+ "loss": 0.4105,
3492
+ "step": 28650
3493
+ },
3494
+ {
3495
+ "epoch": 2.34,
3496
+ "learning_rate": 4.408952629291613e-06,
3497
+ "loss": 0.3675,
3498
+ "step": 28700
3499
+ },
3500
+ {
3501
+ "epoch": 2.34,
3502
+ "learning_rate": 4.381790525858323e-06,
3503
+ "loss": 0.3944,
3504
+ "step": 28750
3505
+ },
3506
+ {
3507
+ "epoch": 2.35,
3508
+ "learning_rate": 4.354628422425033e-06,
3509
+ "loss": 0.4149,
3510
+ "step": 28800
3511
+ },
3512
+ {
3513
+ "epoch": 2.35,
3514
+ "learning_rate": 4.327466318991743e-06,
3515
+ "loss": 0.3901,
3516
+ "step": 28850
3517
+ },
3518
+ {
3519
+ "epoch": 2.35,
3520
+ "learning_rate": 4.300304215558454e-06,
3521
+ "loss": 0.3982,
3522
+ "step": 28900
3523
+ },
3524
+ {
3525
+ "epoch": 2.36,
3526
+ "learning_rate": 4.273142112125163e-06,
3527
+ "loss": 0.4135,
3528
+ "step": 28950
3529
+ },
3530
+ {
3531
+ "epoch": 2.36,
3532
+ "learning_rate": 4.2459800086918734e-06,
3533
+ "loss": 0.4122,
3534
+ "step": 29000
3535
+ },
3536
+ {
3537
+ "epoch": 2.37,
3538
+ "learning_rate": 4.218817905258584e-06,
3539
+ "loss": 0.3816,
3540
+ "step": 29050
3541
+ },
3542
+ {
3543
+ "epoch": 2.37,
3544
+ "learning_rate": 4.191655801825294e-06,
3545
+ "loss": 0.3875,
3546
+ "step": 29100
3547
+ },
3548
+ {
3549
+ "epoch": 2.38,
3550
+ "learning_rate": 4.164493698392004e-06,
3551
+ "loss": 0.4247,
3552
+ "step": 29150
3553
+ },
3554
+ {
3555
+ "epoch": 2.38,
3556
+ "learning_rate": 4.137331594958714e-06,
3557
+ "loss": 0.4301,
3558
+ "step": 29200
3559
+ },
3560
+ {
3561
+ "epoch": 2.38,
3562
+ "learning_rate": 4.110169491525424e-06,
3563
+ "loss": 0.4442,
3564
+ "step": 29250
3565
+ },
3566
+ {
3567
+ "epoch": 2.39,
3568
+ "learning_rate": 4.083007388092134e-06,
3569
+ "loss": 0.4154,
3570
+ "step": 29300
3571
+ },
3572
+ {
3573
+ "epoch": 2.39,
3574
+ "learning_rate": 4.055845284658844e-06,
3575
+ "loss": 0.4089,
3576
+ "step": 29350
3577
+ },
3578
+ {
3579
+ "epoch": 2.4,
3580
+ "learning_rate": 4.028683181225555e-06,
3581
+ "loss": 0.3797,
3582
+ "step": 29400
3583
+ },
3584
+ {
3585
+ "epoch": 2.4,
3586
+ "learning_rate": 4.0015210777922645e-06,
3587
+ "loss": 0.4433,
3588
+ "step": 29450
3589
+ },
3590
+ {
3591
+ "epoch": 2.4,
3592
+ "learning_rate": 3.974358974358974e-06,
3593
+ "loss": 0.3922,
3594
+ "step": 29500
3595
+ },
3596
+ {
3597
+ "epoch": 2.41,
3598
+ "learning_rate": 3.947196870925685e-06,
3599
+ "loss": 0.4504,
3600
+ "step": 29550
3601
+ },
3602
+ {
3603
+ "epoch": 2.41,
3604
+ "learning_rate": 3.920034767492395e-06,
3605
+ "loss": 0.3998,
3606
+ "step": 29600
3607
+ },
3608
+ {
3609
+ "epoch": 2.42,
3610
+ "learning_rate": 3.8928726640591055e-06,
3611
+ "loss": 0.3771,
3612
+ "step": 29650
3613
+ },
3614
+ {
3615
+ "epoch": 2.42,
3616
+ "learning_rate": 3.865710560625815e-06,
3617
+ "loss": 0.4342,
3618
+ "step": 29700
3619
+ },
3620
+ {
3621
+ "epoch": 2.42,
3622
+ "learning_rate": 3.838548457192525e-06,
3623
+ "loss": 0.4114,
3624
+ "step": 29750
3625
+ },
3626
+ {
3627
+ "epoch": 2.43,
3628
+ "learning_rate": 3.8113863537592354e-06,
3629
+ "loss": 0.3966,
3630
+ "step": 29800
3631
+ },
3632
+ {
3633
+ "epoch": 2.43,
3634
+ "learning_rate": 3.7842242503259457e-06,
3635
+ "loss": 0.411,
3636
+ "step": 29850
3637
+ },
3638
+ {
3639
+ "epoch": 2.44,
3640
+ "learning_rate": 3.7570621468926555e-06,
3641
+ "loss": 0.4119,
3642
+ "step": 29900
3643
+ },
3644
+ {
3645
+ "epoch": 2.44,
3646
+ "learning_rate": 3.7299000434593658e-06,
3647
+ "loss": 0.4471,
3648
+ "step": 29950
3649
+ },
3650
+ {
3651
+ "epoch": 2.44,
3652
+ "learning_rate": 3.702737940026076e-06,
3653
+ "loss": 0.3989,
3654
+ "step": 30000
3655
+ },
3656
+ {
3657
+ "epoch": 2.44,
3658
+ "eval_accuracy": 0.7892002037697402,
3659
+ "eval_loss": 0.5580592751502991,
3660
+ "eval_runtime": 16.7363,
3661
+ "eval_samples_per_second": 586.449,
3662
+ "eval_steps_per_second": 36.687,
3663
+ "step": 30000
3664
+ },
3665
+ {
3666
+ "epoch": 2.45,
3667
+ "learning_rate": 3.6755758365927863e-06,
3668
+ "loss": 0.3757,
3669
+ "step": 30050
3670
+ },
3671
+ {
3672
+ "epoch": 2.45,
3673
+ "learning_rate": 3.648413733159496e-06,
3674
+ "loss": 0.3962,
3675
+ "step": 30100
3676
+ },
3677
+ {
3678
+ "epoch": 2.46,
3679
+ "learning_rate": 3.6212516297262064e-06,
3680
+ "loss": 0.4181,
3681
+ "step": 30150
3682
+ },
3683
+ {
3684
+ "epoch": 2.46,
3685
+ "learning_rate": 3.5940895262929166e-06,
3686
+ "loss": 0.4102,
3687
+ "step": 30200
3688
+ },
3689
+ {
3690
+ "epoch": 2.46,
3691
+ "learning_rate": 3.566927422859627e-06,
3692
+ "loss": 0.4232,
3693
+ "step": 30250
3694
+ },
3695
+ {
3696
+ "epoch": 2.47,
3697
+ "learning_rate": 3.5397653194263363e-06,
3698
+ "loss": 0.4225,
3699
+ "step": 30300
3700
+ },
3701
+ {
3702
+ "epoch": 2.47,
3703
+ "learning_rate": 3.5126032159930466e-06,
3704
+ "loss": 0.4211,
3705
+ "step": 30350
3706
+ },
3707
+ {
3708
+ "epoch": 2.48,
3709
+ "learning_rate": 3.4854411125597572e-06,
3710
+ "loss": 0.4046,
3711
+ "step": 30400
3712
+ },
3713
+ {
3714
+ "epoch": 2.48,
3715
+ "learning_rate": 3.4582790091264675e-06,
3716
+ "loss": 0.424,
3717
+ "step": 30450
3718
+ },
3719
+ {
3720
+ "epoch": 2.49,
3721
+ "learning_rate": 3.431116905693177e-06,
3722
+ "loss": 0.4039,
3723
+ "step": 30500
3724
+ },
3725
+ {
3726
+ "epoch": 2.49,
3727
+ "learning_rate": 3.403954802259887e-06,
3728
+ "loss": 0.4225,
3729
+ "step": 30550
3730
+ },
3731
+ {
3732
+ "epoch": 2.49,
3733
+ "learning_rate": 3.3767926988265974e-06,
3734
+ "loss": 0.3794,
3735
+ "step": 30600
3736
+ },
3737
+ {
3738
+ "epoch": 2.5,
3739
+ "learning_rate": 3.3496305953933073e-06,
3740
+ "loss": 0.4136,
3741
+ "step": 30650
3742
+ },
3743
+ {
3744
+ "epoch": 2.5,
3745
+ "learning_rate": 3.3224684919600175e-06,
3746
+ "loss": 0.3917,
3747
+ "step": 30700
3748
+ },
3749
+ {
3750
+ "epoch": 2.51,
3751
+ "learning_rate": 3.2953063885267278e-06,
3752
+ "loss": 0.3823,
3753
+ "step": 30750
3754
+ },
3755
+ {
3756
+ "epoch": 2.51,
3757
+ "learning_rate": 3.268144285093438e-06,
3758
+ "loss": 0.3955,
3759
+ "step": 30800
3760
+ },
3761
+ {
3762
+ "epoch": 2.51,
3763
+ "learning_rate": 3.240982181660148e-06,
3764
+ "loss": 0.4004,
3765
+ "step": 30850
3766
+ },
3767
+ {
3768
+ "epoch": 2.52,
3769
+ "learning_rate": 3.213820078226858e-06,
3770
+ "loss": 0.4052,
3771
+ "step": 30900
3772
+ },
3773
+ {
3774
+ "epoch": 2.52,
3775
+ "learning_rate": 3.1866579747935684e-06,
3776
+ "loss": 0.4154,
3777
+ "step": 30950
3778
+ },
3779
+ {
3780
+ "epoch": 2.53,
3781
+ "learning_rate": 3.1594958713602786e-06,
3782
+ "loss": 0.3727,
3783
+ "step": 31000
3784
+ },
3785
+ {
3786
+ "epoch": 2.53,
3787
+ "learning_rate": 3.1323337679269885e-06,
3788
+ "loss": 0.3971,
3789
+ "step": 31050
3790
+ },
3791
+ {
3792
+ "epoch": 2.53,
3793
+ "learning_rate": 3.1051716644936987e-06,
3794
+ "loss": 0.3984,
3795
+ "step": 31100
3796
+ },
3797
+ {
3798
+ "epoch": 2.54,
3799
+ "learning_rate": 3.078009561060409e-06,
3800
+ "loss": 0.3918,
3801
+ "step": 31150
3802
+ },
3803
+ {
3804
+ "epoch": 2.54,
3805
+ "learning_rate": 3.0508474576271192e-06,
3806
+ "loss": 0.4079,
3807
+ "step": 31200
3808
+ },
3809
+ {
3810
+ "epoch": 2.55,
3811
+ "learning_rate": 3.0236853541938286e-06,
3812
+ "loss": 0.3879,
3813
+ "step": 31250
3814
+ },
3815
+ {
3816
+ "epoch": 2.55,
3817
+ "learning_rate": 2.996523250760539e-06,
3818
+ "loss": 0.3808,
3819
+ "step": 31300
3820
+ },
3821
+ {
3822
+ "epoch": 2.55,
3823
+ "learning_rate": 2.969361147327249e-06,
3824
+ "loss": 0.4068,
3825
+ "step": 31350
3826
+ },
3827
+ {
3828
+ "epoch": 2.56,
3829
+ "learning_rate": 2.9421990438939594e-06,
3830
+ "loss": 0.4029,
3831
+ "step": 31400
3832
+ },
3833
+ {
3834
+ "epoch": 2.56,
3835
+ "learning_rate": 2.9150369404606692e-06,
3836
+ "loss": 0.3524,
3837
+ "step": 31450
3838
+ },
3839
+ {
3840
+ "epoch": 2.57,
3841
+ "learning_rate": 2.8878748370273795e-06,
3842
+ "loss": 0.4021,
3843
+ "step": 31500
3844
+ },
3845
+ {
3846
+ "epoch": 2.57,
3847
+ "learning_rate": 2.8607127335940898e-06,
3848
+ "loss": 0.3924,
3849
+ "step": 31550
3850
+ },
3851
+ {
3852
+ "epoch": 2.57,
3853
+ "learning_rate": 2.8335506301608e-06,
3854
+ "loss": 0.4303,
3855
+ "step": 31600
3856
+ },
3857
+ {
3858
+ "epoch": 2.58,
3859
+ "learning_rate": 2.80638852672751e-06,
3860
+ "loss": 0.3993,
3861
+ "step": 31650
3862
+ },
3863
+ {
3864
+ "epoch": 2.58,
3865
+ "learning_rate": 2.77922642329422e-06,
3866
+ "loss": 0.3992,
3867
+ "step": 31700
3868
+ },
3869
+ {
3870
+ "epoch": 2.59,
3871
+ "learning_rate": 2.7520643198609304e-06,
3872
+ "loss": 0.4215,
3873
+ "step": 31750
3874
+ },
3875
+ {
3876
+ "epoch": 2.59,
3877
+ "learning_rate": 2.7249022164276406e-06,
3878
+ "loss": 0.4107,
3879
+ "step": 31800
3880
+ },
3881
+ {
3882
+ "epoch": 2.6,
3883
+ "learning_rate": 2.6977401129943505e-06,
3884
+ "loss": 0.3876,
3885
+ "step": 31850
3886
+ },
3887
+ {
3888
+ "epoch": 2.6,
3889
+ "learning_rate": 2.6705780095610607e-06,
3890
+ "loss": 0.3873,
3891
+ "step": 31900
3892
+ },
3893
+ {
3894
+ "epoch": 2.6,
3895
+ "learning_rate": 2.643415906127771e-06,
3896
+ "loss": 0.4126,
3897
+ "step": 31950
3898
+ },
3899
+ {
3900
+ "epoch": 2.61,
3901
+ "learning_rate": 2.6162538026944812e-06,
3902
+ "loss": 0.4173,
3903
+ "step": 32000
3904
+ },
3905
+ {
3906
+ "epoch": 2.61,
3907
+ "learning_rate": 2.589091699261191e-06,
3908
+ "loss": 0.409,
3909
+ "step": 32050
3910
+ },
3911
+ {
3912
+ "epoch": 2.62,
3913
+ "learning_rate": 2.5619295958279013e-06,
3914
+ "loss": 0.4129,
3915
+ "step": 32100
3916
+ },
3917
+ {
3918
+ "epoch": 2.62,
3919
+ "learning_rate": 2.5347674923946116e-06,
3920
+ "loss": 0.3676,
3921
+ "step": 32150
3922
+ },
3923
+ {
3924
+ "epoch": 2.62,
3925
+ "learning_rate": 2.507605388961321e-06,
3926
+ "loss": 0.3939,
3927
+ "step": 32200
3928
+ },
3929
+ {
3930
+ "epoch": 2.63,
3931
+ "learning_rate": 2.4804432855280312e-06,
3932
+ "loss": 0.3828,
3933
+ "step": 32250
3934
+ },
3935
+ {
3936
+ "epoch": 2.63,
3937
+ "learning_rate": 2.4532811820947415e-06,
3938
+ "loss": 0.418,
3939
+ "step": 32300
3940
+ },
3941
+ {
3942
+ "epoch": 2.64,
3943
+ "learning_rate": 2.4261190786614517e-06,
3944
+ "loss": 0.4142,
3945
+ "step": 32350
3946
+ },
3947
+ {
3948
+ "epoch": 2.64,
3949
+ "learning_rate": 2.398956975228162e-06,
3950
+ "loss": 0.398,
3951
+ "step": 32400
3952
+ },
3953
+ {
3954
+ "epoch": 2.64,
3955
+ "learning_rate": 2.371794871794872e-06,
3956
+ "loss": 0.4077,
3957
+ "step": 32450
3958
+ },
3959
+ {
3960
+ "epoch": 2.65,
3961
+ "learning_rate": 2.344632768361582e-06,
3962
+ "loss": 0.412,
3963
+ "step": 32500
3964
+ },
3965
+ {
3966
+ "epoch": 2.65,
3967
+ "learning_rate": 2.3174706649282924e-06,
3968
+ "loss": 0.4163,
3969
+ "step": 32550
3970
+ },
3971
+ {
3972
+ "epoch": 2.66,
3973
+ "learning_rate": 2.2903085614950026e-06,
3974
+ "loss": 0.4202,
3975
+ "step": 32600
3976
+ },
3977
+ {
3978
+ "epoch": 2.66,
3979
+ "learning_rate": 2.2631464580617124e-06,
3980
+ "loss": 0.4121,
3981
+ "step": 32650
3982
+ },
3983
+ {
3984
+ "epoch": 2.66,
3985
+ "learning_rate": 2.2359843546284227e-06,
3986
+ "loss": 0.4,
3987
+ "step": 32700
3988
+ },
3989
+ {
3990
+ "epoch": 2.67,
3991
+ "learning_rate": 2.2088222511951325e-06,
3992
+ "loss": 0.4008,
3993
+ "step": 32750
3994
+ },
3995
+ {
3996
+ "epoch": 2.67,
3997
+ "learning_rate": 2.1816601477618428e-06,
3998
+ "loss": 0.4103,
3999
+ "step": 32800
4000
+ },
4001
+ {
4002
+ "epoch": 2.68,
4003
+ "learning_rate": 2.154498044328553e-06,
4004
+ "loss": 0.3995,
4005
+ "step": 32850
4006
+ },
4007
+ {
4008
+ "epoch": 2.68,
4009
+ "learning_rate": 2.1273359408952633e-06,
4010
+ "loss": 0.3924,
4011
+ "step": 32900
4012
+ },
4013
+ {
4014
+ "epoch": 2.68,
4015
+ "learning_rate": 2.100173837461973e-06,
4016
+ "loss": 0.431,
4017
+ "step": 32950
4018
+ },
4019
+ {
4020
+ "epoch": 2.69,
4021
+ "learning_rate": 2.0730117340286834e-06,
4022
+ "loss": 0.4029,
4023
+ "step": 33000
4024
+ },
4025
+ {
4026
+ "epoch": 2.69,
4027
+ "learning_rate": 2.0458496305953932e-06,
4028
+ "loss": 0.3988,
4029
+ "step": 33050
4030
+ },
4031
+ {
4032
+ "epoch": 2.7,
4033
+ "learning_rate": 2.018687527162104e-06,
4034
+ "loss": 0.4012,
4035
+ "step": 33100
4036
+ },
4037
+ {
4038
+ "epoch": 2.7,
4039
+ "learning_rate": 1.9915254237288137e-06,
4040
+ "loss": 0.3842,
4041
+ "step": 33150
4042
+ },
4043
+ {
4044
+ "epoch": 2.71,
4045
+ "learning_rate": 1.964363320295524e-06,
4046
+ "loss": 0.3976,
4047
+ "step": 33200
4048
+ },
4049
+ {
4050
+ "epoch": 2.71,
4051
+ "learning_rate": 1.937201216862234e-06,
4052
+ "loss": 0.4081,
4053
+ "step": 33250
4054
+ },
4055
+ {
4056
+ "epoch": 2.71,
4057
+ "learning_rate": 1.910039113428944e-06,
4058
+ "loss": 0.3973,
4059
+ "step": 33300
4060
+ },
4061
+ {
4062
+ "epoch": 2.72,
4063
+ "learning_rate": 1.8828770099956541e-06,
4064
+ "loss": 0.4087,
4065
+ "step": 33350
4066
+ },
4067
+ {
4068
+ "epoch": 2.72,
4069
+ "learning_rate": 1.8557149065623644e-06,
4070
+ "loss": 0.387,
4071
+ "step": 33400
4072
+ },
4073
+ {
4074
+ "epoch": 2.73,
4075
+ "learning_rate": 1.8285528031290744e-06,
4076
+ "loss": 0.3997,
4077
+ "step": 33450
4078
+ },
4079
+ {
4080
+ "epoch": 2.73,
4081
+ "learning_rate": 1.8013906996957845e-06,
4082
+ "loss": 0.3971,
4083
+ "step": 33500
4084
+ },
4085
+ {
4086
+ "epoch": 2.73,
4087
+ "learning_rate": 1.7742285962624947e-06,
4088
+ "loss": 0.3768,
4089
+ "step": 33550
4090
+ },
4091
+ {
4092
+ "epoch": 2.74,
4093
+ "learning_rate": 1.7470664928292048e-06,
4094
+ "loss": 0.364,
4095
+ "step": 33600
4096
+ },
4097
+ {
4098
+ "epoch": 2.74,
4099
+ "learning_rate": 1.719904389395915e-06,
4100
+ "loss": 0.4112,
4101
+ "step": 33650
4102
+ },
4103
+ {
4104
+ "epoch": 2.75,
4105
+ "learning_rate": 1.6927422859626249e-06,
4106
+ "loss": 0.4038,
4107
+ "step": 33700
4108
+ },
4109
+ {
4110
+ "epoch": 2.75,
4111
+ "learning_rate": 1.6655801825293353e-06,
4112
+ "loss": 0.365,
4113
+ "step": 33750
4114
+ },
4115
+ {
4116
+ "epoch": 2.75,
4117
+ "learning_rate": 1.6384180790960452e-06,
4118
+ "loss": 0.4264,
4119
+ "step": 33800
4120
+ },
4121
+ {
4122
+ "epoch": 2.76,
4123
+ "learning_rate": 1.6112559756627554e-06,
4124
+ "loss": 0.4099,
4125
+ "step": 33850
4126
+ },
4127
+ {
4128
+ "epoch": 2.76,
4129
+ "learning_rate": 1.5840938722294655e-06,
4130
+ "loss": 0.3823,
4131
+ "step": 33900
4132
+ },
4133
+ {
4134
+ "epoch": 2.77,
4135
+ "learning_rate": 1.5569317687961757e-06,
4136
+ "loss": 0.427,
4137
+ "step": 33950
4138
+ },
4139
+ {
4140
+ "epoch": 2.77,
4141
+ "learning_rate": 1.5297696653628858e-06,
4142
+ "loss": 0.3966,
4143
+ "step": 34000
4144
+ },
4145
+ {
4146
+ "epoch": 2.77,
4147
+ "learning_rate": 1.502607561929596e-06,
4148
+ "loss": 0.4186,
4149
+ "step": 34050
4150
+ },
4151
+ {
4152
+ "epoch": 2.78,
4153
+ "learning_rate": 1.475445458496306e-06,
4154
+ "loss": 0.3757,
4155
+ "step": 34100
4156
+ },
4157
+ {
4158
+ "epoch": 2.78,
4159
+ "learning_rate": 1.4482833550630163e-06,
4160
+ "loss": 0.3756,
4161
+ "step": 34150
4162
+ },
4163
+ {
4164
+ "epoch": 2.79,
4165
+ "learning_rate": 1.4211212516297262e-06,
4166
+ "loss": 0.3846,
4167
+ "step": 34200
4168
+ },
4169
+ {
4170
+ "epoch": 2.79,
4171
+ "learning_rate": 1.3939591481964364e-06,
4172
+ "loss": 0.425,
4173
+ "step": 34250
4174
+ },
4175
+ {
4176
+ "epoch": 2.79,
4177
+ "learning_rate": 1.3667970447631465e-06,
4178
+ "loss": 0.4,
4179
+ "step": 34300
4180
+ },
4181
+ {
4182
+ "epoch": 2.8,
4183
+ "learning_rate": 1.3396349413298567e-06,
4184
+ "loss": 0.4014,
4185
+ "step": 34350
4186
+ },
4187
+ {
4188
+ "epoch": 2.8,
4189
+ "learning_rate": 1.3124728378965668e-06,
4190
+ "loss": 0.4189,
4191
+ "step": 34400
4192
+ },
4193
+ {
4194
+ "epoch": 2.81,
4195
+ "learning_rate": 1.285310734463277e-06,
4196
+ "loss": 0.4223,
4197
+ "step": 34450
4198
+ },
4199
+ {
4200
+ "epoch": 2.81,
4201
+ "learning_rate": 1.258148631029987e-06,
4202
+ "loss": 0.3971,
4203
+ "step": 34500
4204
+ },
4205
+ {
4206
+ "epoch": 2.82,
4207
+ "learning_rate": 1.2309865275966971e-06,
4208
+ "loss": 0.4106,
4209
+ "step": 34550
4210
+ },
4211
+ {
4212
+ "epoch": 2.82,
4213
+ "learning_rate": 1.2038244241634074e-06,
4214
+ "loss": 0.4021,
4215
+ "step": 34600
4216
+ },
4217
+ {
4218
+ "epoch": 2.82,
4219
+ "learning_rate": 1.1766623207301174e-06,
4220
+ "loss": 0.4022,
4221
+ "step": 34650
4222
+ },
4223
+ {
4224
+ "epoch": 2.83,
4225
+ "learning_rate": 1.1495002172968275e-06,
4226
+ "loss": 0.3845,
4227
+ "step": 34700
4228
+ },
4229
+ {
4230
+ "epoch": 2.83,
4231
+ "learning_rate": 1.1223381138635377e-06,
4232
+ "loss": 0.3963,
4233
+ "step": 34750
4234
+ },
4235
+ {
4236
+ "epoch": 2.84,
4237
+ "learning_rate": 1.0951760104302478e-06,
4238
+ "loss": 0.4259,
4239
+ "step": 34800
4240
+ },
4241
+ {
4242
+ "epoch": 2.84,
4243
+ "learning_rate": 1.068013906996958e-06,
4244
+ "loss": 0.434,
4245
+ "step": 34850
4246
+ },
4247
+ {
4248
+ "epoch": 2.84,
4249
+ "learning_rate": 1.040851803563668e-06,
4250
+ "loss": 0.3829,
4251
+ "step": 34900
4252
+ },
4253
+ {
4254
+ "epoch": 2.85,
4255
+ "learning_rate": 1.0136897001303781e-06,
4256
+ "loss": 0.3953,
4257
+ "step": 34950
4258
+ },
4259
+ {
4260
+ "epoch": 2.85,
4261
+ "learning_rate": 9.865275966970884e-07,
4262
+ "loss": 0.4013,
4263
+ "step": 35000
4264
+ },
4265
+ {
4266
+ "epoch": 2.85,
4267
+ "eval_accuracy": 0.7955170657157412,
4268
+ "eval_loss": 0.5501273274421692,
4269
+ "eval_runtime": 16.8148,
4270
+ "eval_samples_per_second": 583.712,
4271
+ "eval_steps_per_second": 36.515,
4272
+ "step": 35000
4273
+ },
4274
+ {
4275
+ "epoch": 2.86,
4276
+ "learning_rate": 9.593654932637984e-07,
4277
+ "loss": 0.3919,
4278
+ "step": 35050
4279
+ },
4280
+ {
4281
+ "epoch": 2.86,
4282
+ "learning_rate": 9.322033898305086e-07,
4283
+ "loss": 0.4299,
4284
+ "step": 35100
4285
+ },
4286
+ {
4287
+ "epoch": 2.86,
4288
+ "learning_rate": 9.050412863972187e-07,
4289
+ "loss": 0.4219,
4290
+ "step": 35150
4291
+ },
4292
+ {
4293
+ "epoch": 2.87,
4294
+ "learning_rate": 8.778791829639289e-07,
4295
+ "loss": 0.3997,
4296
+ "step": 35200
4297
+ },
4298
+ {
4299
+ "epoch": 2.87,
4300
+ "learning_rate": 8.507170795306389e-07,
4301
+ "loss": 0.4138,
4302
+ "step": 35250
4303
+ },
4304
+ {
4305
+ "epoch": 2.88,
4306
+ "learning_rate": 8.235549760973491e-07,
4307
+ "loss": 0.3842,
4308
+ "step": 35300
4309
+ },
4310
+ {
4311
+ "epoch": 2.88,
4312
+ "learning_rate": 7.963928726640592e-07,
4313
+ "loss": 0.4033,
4314
+ "step": 35350
4315
+ },
4316
+ {
4317
+ "epoch": 2.88,
4318
+ "learning_rate": 7.692307692307694e-07,
4319
+ "loss": 0.4052,
4320
+ "step": 35400
4321
+ },
4322
+ {
4323
+ "epoch": 2.89,
4324
+ "learning_rate": 7.420686657974795e-07,
4325
+ "loss": 0.4042,
4326
+ "step": 35450
4327
+ },
4328
+ {
4329
+ "epoch": 2.89,
4330
+ "learning_rate": 7.149065623641896e-07,
4331
+ "loss": 0.4045,
4332
+ "step": 35500
4333
+ },
4334
+ {
4335
+ "epoch": 2.9,
4336
+ "learning_rate": 6.877444589308997e-07,
4337
+ "loss": 0.4073,
4338
+ "step": 35550
4339
+ },
4340
+ {
4341
+ "epoch": 2.9,
4342
+ "learning_rate": 6.605823554976099e-07,
4343
+ "loss": 0.4266,
4344
+ "step": 35600
4345
+ },
4346
+ {
4347
+ "epoch": 2.9,
4348
+ "learning_rate": 6.3342025206432e-07,
4349
+ "loss": 0.3942,
4350
+ "step": 35650
4351
+ },
4352
+ {
4353
+ "epoch": 2.91,
4354
+ "learning_rate": 6.0625814863103e-07,
4355
+ "loss": 0.4103,
4356
+ "step": 35700
4357
+ },
4358
+ {
4359
+ "epoch": 2.91,
4360
+ "learning_rate": 5.790960451977402e-07,
4361
+ "loss": 0.4249,
4362
+ "step": 35750
4363
+ },
4364
+ {
4365
+ "epoch": 2.92,
4366
+ "learning_rate": 5.519339417644502e-07,
4367
+ "loss": 0.4074,
4368
+ "step": 35800
4369
+ },
4370
+ {
4371
+ "epoch": 2.92,
4372
+ "learning_rate": 5.247718383311604e-07,
4373
+ "loss": 0.4009,
4374
+ "step": 35850
4375
+ },
4376
+ {
4377
+ "epoch": 2.93,
4378
+ "learning_rate": 4.976097348978705e-07,
4379
+ "loss": 0.3856,
4380
+ "step": 35900
4381
+ },
4382
+ {
4383
+ "epoch": 2.93,
4384
+ "learning_rate": 4.704476314645807e-07,
4385
+ "loss": 0.3941,
4386
+ "step": 35950
4387
+ },
4388
+ {
4389
+ "epoch": 2.93,
4390
+ "learning_rate": 4.432855280312908e-07,
4391
+ "loss": 0.4133,
4392
+ "step": 36000
4393
+ },
4394
+ {
4395
+ "epoch": 2.94,
4396
+ "learning_rate": 4.1612342459800084e-07,
4397
+ "loss": 0.4005,
4398
+ "step": 36050
4399
+ },
4400
+ {
4401
+ "epoch": 2.94,
4402
+ "learning_rate": 3.88961321164711e-07,
4403
+ "loss": 0.368,
4404
+ "step": 36100
4405
+ },
4406
+ {
4407
+ "epoch": 2.95,
4408
+ "learning_rate": 3.6179921773142114e-07,
4409
+ "loss": 0.4314,
4410
+ "step": 36150
4411
+ },
4412
+ {
4413
+ "epoch": 2.95,
4414
+ "learning_rate": 3.3463711429813124e-07,
4415
+ "loss": 0.4145,
4416
+ "step": 36200
4417
+ },
4418
+ {
4419
+ "epoch": 2.95,
4420
+ "learning_rate": 3.074750108648414e-07,
4421
+ "loss": 0.4122,
4422
+ "step": 36250
4423
+ },
4424
+ {
4425
+ "epoch": 2.96,
4426
+ "learning_rate": 2.803129074315515e-07,
4427
+ "loss": 0.4063,
4428
+ "step": 36300
4429
+ },
4430
+ {
4431
+ "epoch": 2.96,
4432
+ "learning_rate": 2.5315080399826164e-07,
4433
+ "loss": 0.415,
4434
+ "step": 36350
4435
+ },
4436
+ {
4437
+ "epoch": 2.97,
4438
+ "learning_rate": 2.2598870056497177e-07,
4439
+ "loss": 0.3831,
4440
+ "step": 36400
4441
+ },
4442
+ {
4443
+ "epoch": 2.97,
4444
+ "learning_rate": 1.988265971316819e-07,
4445
+ "loss": 0.3935,
4446
+ "step": 36450
4447
+ },
4448
+ {
4449
+ "epoch": 2.97,
4450
+ "learning_rate": 1.7166449369839202e-07,
4451
+ "loss": 0.3924,
4452
+ "step": 36500
4453
+ },
4454
+ {
4455
+ "epoch": 2.98,
4456
+ "learning_rate": 1.4450239026510214e-07,
4457
+ "loss": 0.375,
4458
+ "step": 36550
4459
+ },
4460
+ {
4461
+ "epoch": 2.98,
4462
+ "learning_rate": 1.1734028683181226e-07,
4463
+ "loss": 0.4107,
4464
+ "step": 36600
4465
+ },
4466
+ {
4467
+ "epoch": 2.99,
4468
+ "learning_rate": 9.017818339852239e-08,
4469
+ "loss": 0.4141,
4470
+ "step": 36650
4471
+ },
4472
+ {
4473
+ "epoch": 2.99,
4474
+ "learning_rate": 6.301607996523251e-08,
4475
+ "loss": 0.4065,
4476
+ "step": 36700
4477
+ },
4478
+ {
4479
+ "epoch": 2.99,
4480
+ "learning_rate": 3.585397653194264e-08,
4481
+ "loss": 0.3686,
4482
+ "step": 36750
4483
+ },
4484
+ {
4485
+ "epoch": 3.0,
4486
+ "learning_rate": 8.691873098652761e-09,
4487
+ "loss": 0.4018,
4488
+ "step": 36800
4489
+ },
4490
+ {
4491
+ "epoch": 3.0,
4492
+ "step": 36816,
4493
+ "total_flos": 7.237278554846584e+16,
4494
+ "train_loss": 0.5138862587508302,
4495
+ "train_runtime": 6461.7512,
4496
+ "train_samples_per_second": 182.32,
4497
+ "train_steps_per_second": 5.698
4498
+ }
4499
+ ],
4500
+ "logging_steps": 50,
4501
+ "max_steps": 36816,
4502
+ "num_input_tokens_seen": 0,
4503
+ "num_train_epochs": 3,
4504
+ "save_steps": 5000,
4505
+ "total_flos": 7.237278554846584e+16,
4506
+ "train_batch_size": 32,
4507
+ "trial_name": null,
4508
+ "trial_params": null
4509
+ }