End of training, 10 epochs, 4 batch size, writer batch size: 500, 16 gradient accumulation steps, learning rate: 5e-05, 30 s acb2dd3 verified sfedar commited on Sep 8
End of training, 10 epochs, 8 batch size, writer batch size: 500, 1 gradient accumulation steps, learning rate: 5e-05, 30 s 00f8a0a verified sfedar commited on Sep 7