End of training, 10 epochs, 4 batch size, writer batch size: 500, 16 gradient accumulation steps, learning rate: 5e-05, 30 s
acb2dd3
verified
sfedar
commited on