sfedar's picture
End of training, 9 epochs, 4 batch size, writer batch size: 1000, 1 gradient accumulation steps, learning rate: 5e-05, 30 s
99a36b5 verified