End of training, 12 epochs, 100 batch size, 1000 writer batch size, 1 gradient accumulation steps, learning rate: 0.0001, 30 s cfd5112 verified sfedar commited on 12 days ago
End of training, 11 epochs, 100 batch size, 1000 writer batch size, 1 gradient accumulation steps, learning rate: 3e-05, 30 s b17058e verified sfedar commited on 15 days ago