sfedar's picture
End of training, 10+1 epochs, 4 batch size, writer batch size: 1000, 1 gradient accumulation steps, learning rate: 5e-05, 30 s
9d65f14 verified