takarajordan's picture
Fix training parameters and fix cross entropy loss for padding tokens.
4bde2d1 verified