Fix training parameters and fix cross entropy loss for padding tokens.

#1
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment