# Masakhane - Machine Translation for African Languages (Using JoeyNMT)

## Note before beginning:
### - The idea is that you should be able to make minimal changes to this in order to get SOME result for your own translation corpus. 

### - The tl;dr: Go to the **"TODO"** comments which will tell you what to update to get up and running

### - If you actually want to have a clue what you're doing, read the text and peek at the links

### - With 100 epochs, it should take around 7 hours to run in Google Colab

### - Once you've gotten a result for your language, please attach and email your notebook that generated it to masakhanetranslation@gmail.com

### - If you care enough and get a chance, doing a brief background on your language would be amazing. See examples in  [(Martinus, 2019)](https://arxiv.org/abs/1906.05685)

## Retrieve your data & make a parallel corpus

If you are wanting to use the JW300 data referenced on the Masakhane website or in our GitHub repo, you can use `opus-tools` to convert the data into a convenient format. `opus_read` from that package provides a convenient tool for reading the native aligned XML files and to convert them to TMX format. The tool can also be used to fetch relevant files from OPUS on the fly and to filter the data as necessary. [Read the documentation](https://pypi.org/project/opustools-pkg/) for more details.

Once you have your corpus files in TMX format (an xml structure which will include the sentences in your target language and your source language in a single file), we recommend reading them into a pandas dataframe. Thankfully, Jade wrote a silly `tmx2dataframe` package which converts your tmx file to a pandas dataframe. 

In [0]:
"""from google.colab import drive
drive.mount('/content/drive')"""

In [1]:
# TODO: Set your source and target languages. Keep in mind, these traditionally use language codes as found here:
# These will also become the suffix's of all vocab and corpus files used throughout
import os
source_language = "en"
target_language = "pcm" 
lc = False  # If True, lowercase the data.
seed = 42  # Random seed for shuffling.
tag = "jw300-baseline" # Give a unique name to your folder - this is to ensure you don't rewrite any models you've already submitted

os.environ["src"] = source_language # Sets them in bash as well, since we often use bash scripts
os.environ["tgt"] = target_language
os.environ["tag"] = tag


# This will save it to a folder in our gdrive instead!
!mkdir -p "en_pcm/$src-$tgt-$tag"
os.environ["experiment_path"] = "en_pcm/%s-%s-%s" % (source_language, target_language, tag)

In [2]:
!mkdir -p "en_pcm/$src-$tgt-$tag"

In [3]:
!echo $experiment_path

en_pcm/en-pcm-jw300-baseline


In [4]:
# Install opus-tools
! pip install opustools-pkg

[33mYou are using pip version 10.0.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [5]:
# Downloading our corpus
! opus_read -d JW300 -s $src -t $tgt -wm moses -w jw300.$src jw300.$tgt -q 

# extract the corpus file
! gunzip JW300_latest_xml_$src-$tgt.xml.gz --y


Alignment file /proj/nlpl/data/OPUS/JW300/latest/xml/en-pcm.xml.gz not found. The following files are available for downloading:

 256 KB https://object.pouta.csc.fi/OPUS-JW300/v1/xml/en-pcm.xml.gz
 263 MB https://object.pouta.csc.fi/OPUS-JW300/v1/xml/en.zip
   3 MB https://object.pouta.csc.fi/OPUS-JW300/v1/xml/pcm.zip

 266 MB Total size
./JW300_latest_xml_en-pcm.xml.gz ... 100% of 256 KB
./JW300_latest_xml_en.zip ... 100% of 263 MB
./JW300_latest_xml_pcm.zip ... 100% of 3 MB
gzip: unrecognized option '--y'
Try `gzip --help' for more information.


In [6]:
# Download the global test set.
! wget https://raw.githubusercontent.com/juliakreutzer/masakhane/master/jw300_utils/test/test.en-any.en
  
# And the specific test set for this language pair.
os.environ["trg"] = target_language 
os.environ["src"] = source_language 

! wget https://raw.githubusercontent.com/juliakreutzer/masakhane/master/jw300_utils/test/test.en-$trg.en 
! mv test.en-$trg.en test.en
! wget https://raw.githubusercontent.com/juliakreutzer/masakhane/master/jw300_utils/test/test.en-$trg.$trg 
! mv test.en-$trg.$trg test.$trg

--2020-02-09 12:37:41--  https://raw.githubusercontent.com/juliakreutzer/masakhane/master/jw300_utils/test/test.en-any.en
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.24.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.24.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 277791 (271K) [text/plain]
Saving to: ‘test.en-any.en’


2020-02-09 12:37:41 (35.2 MB/s) - ‘test.en-any.en’ saved [277791/277791]

--2020-02-09 12:37:41--  https://raw.githubusercontent.com/juliakreutzer/masakhane/master/jw300_utils/test/test.en-pcm.en
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.24.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.24.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 154369 (151K) [text/plain]
Saving to: ‘test.en-pcm.en’


2020-02-09 12:37:42 (29.3 MB/s) - ‘test.en-pcm.en’ saved [154369/154369]

--2020-02-09 12:

In [7]:
# Read the test data to filter from train and dev splits.
# Store english portion in set for quick filtering checks.
en_test_sents = set()
filter_test_sents = "test.en-any.en"
j = 0
with open(filter_test_sents) as f:
  for line in f:
    en_test_sents.add(line.strip())
    j += 1
print('Loaded {} global test sentences to filter from the training/dev data.'.format(j))

Loaded 3571 global test sentences to filter from the training/dev data.


In [8]:
import pandas as pd

# TMX file to dataframe
source_file = 'jw300.' + source_language
target_file = 'jw300.' + target_language

source = []
target = []
skip_lines = []  # Collect the line numbers of the source portion to skip the same lines for the target portion.
with open(source_file) as f:
    for i, line in enumerate(f):
        # Skip sentences that are contained in the test set.
        if line.strip() not in en_test_sents:
            source.append(line.strip())
        else:
            skip_lines.append(i)             
with open(target_file) as f:
    for j, line in enumerate(f):
        # Only add to corpus if corresponding source was not skipped.
        if j not in skip_lines:
            target.append(line.strip())
    
print('Loaded data and skipped {}/{} lines since contained in test set.'.format(len(skip_lines), i))
    
df = pd.DataFrame(zip(source, target), columns=['source_sentence', 'target_sentence'])
# if you get TypeError: data argument can't be an iterator is because of your zip version run this below
#df = pd.DataFrame(list(zip(source, target)), columns=['source_sentence', 'target_sentence'])
df.head(3)

Loaded data and skipped 2536/26020 lines since contained in test set.


Unnamed: 0,source_sentence,target_sentence
0,3 Settle Differences in a Spirit of Love,3 Make Una Dey Use Love Settle Quarrel
1,"Because of our inherited imperfection , we are...",Because of the sin wey all of us carry from be...
2,This article shows how Bible principles can be...,This topic go show us how we fit let the thing...


## Pre-processing and export

It is generally a good idea to remove duplicate translations and conflicting translations from the corpus. In practice, these public corpora include some number of these that need to be cleaned.

In addition we will split our data into dev/test/train and export to the filesystem.

In [9]:
# drop duplicate translations
df_pp = df.drop_duplicates()

# drop conflicting translations
# (this is optional and something that you might want to comment out 
# depending on the size of your corpus)
df_pp.drop_duplicates(subset='source_sentence', inplace=True)
df_pp.drop_duplicates(subset='target_sentence', inplace=True)

# Shuffle the data to remove bias in dev set selection.
df_pp = df_pp.sample(frac=1, random_state=seed).reset_index(drop=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [10]:
# Install fuzzy wuzzy to remove "almost duplicate" sentences in the
# test and training sets.
! pip install fuzzywuzzy
! pip install python-Levenshtein
import time
from fuzzywuzzy import process
import numpy as np

# reset the index of the training set after previous filtering
df_pp.reset_index(drop=False, inplace=True)

# Remove samples from the training data set if they "almost overlap" with the
# samples in the test set.

# Filtering function. Adjust pad to narrow down the candidate matches to
# within a certain length of characters of the given sample.
def fuzzfilter(sample, candidates, pad):
  candidates = [x for x in candidates if len(x) <= len(sample)+pad and len(x) >= len(sample)-pad] 
  if len(candidates) > 0:
    return process.extractOne(sample, candidates)[1]
  else:
    return np.nan

# NOTE - This might run slow depending on the size of your training set. We are
# printing some information to help you track how long it would take. 
scores = []
start_time = time.time()
for idx, row in df_pp.iterrows():
  scores.append(fuzzfilter(row['source_sentence'], list(en_test_sents), 5))
  if idx % 1000 == 0:
    hours, rem = divmod(time.time() - start_time, 3600)
    minutes, seconds = divmod(rem, 60)
    print("{:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds), "%0.2f percent complete" % (100.0*float(idx)/float(len(df_pp))))

# Filter out "almost overlapping samples"
df_pp['scores'] = scores
df_pp = df_pp[df_pp['scores'] < 95]

Collecting fuzzywuzzy
  Downloading https://files.pythonhosted.org/packages/d8/f1/5a267addb30ab7eaa1beab2b9323073815da4551076554ecc890a3595ec9/fuzzywuzzy-0.17.0-py2.py3-none-any.whl
Installing collected packages: fuzzywuzzy
Successfully installed fuzzywuzzy-0.17.0
[33mYou are using pip version 10.0.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
Collecting python-Levenshtein
[?25l  Downloading https://files.pythonhosted.org/packages/42/a9/d1785c85ebf9b7dfacd08938dd028209c34a0ea3b1bcdb895208bd40a67d/python-Levenshtein-0.12.0.tar.gz (48kB)
[K    100% |████████████████████████████████| 51kB 14.8MB/s ta 0:00:01
Building wheels for collected packages: python-Levenshtein
  Running setup.py bdist_wheel for python-Levenshtein ... [?25ldone
[?25h  Stored in directory: /home/ec2-user/.cache/pip/wheels/de/c2/93/660fd5f7559049268ad2dc6d81c4e39e9e36518766eaf7e342
Successfully built python-Levenshtein
Installing collected pa



00:04:50.25 51.68 percent complete
00:05:15.71 56.38 percent complete
00:05:41.33 61.08 percent complete
00:06:07.34 65.78 percent complete
00:06:33.58 70.48 percent complete
00:06:59.31 75.18 percent complete
00:07:26.38 79.88 percent complete
00:07:53.15 84.57 percent complete
00:08:19.62 89.27 percent complete
00:08:45.67 93.97 percent complete




00:09:11.83 98.67 percent complete


In [11]:
# This section does the split between train/dev for the parallel corpora then saves them as separate files
# We use 1000 dev test and the given test set.
import csv

# Do the split between dev/train and create parallel corpora
num_dev_patterns = 1000

# Optional: lower case the corpora - this will make it easier to generalize, but without proper casing.
if lc:  # Julia: making lowercasing optional
    df_pp["source_sentence"] = df_pp["source_sentence"].str.lower()
    df_pp["target_sentence"] = df_pp["target_sentence"].str.lower()

# Julia: test sets are already generated
dev = df_pp.tail(num_dev_patterns) # Herman: Error in original
stripped = df_pp.drop(df_pp.tail(num_dev_patterns).index)

with open("train."+source_language, "w") as src_file, open("train."+target_language, "w") as trg_file:
  for index, row in stripped.iterrows():
    src_file.write(row["source_sentence"]+"\n")
    trg_file.write(row["target_sentence"]+"\n")
    
with open("dev."+source_language, "w") as src_file, open("dev."+target_language, "w") as trg_file:
  for index, row in dev.iterrows():
    src_file.write(row["source_sentence"]+"\n")
    trg_file.write(row["target_sentence"]+"\n")

#stripped[["source_sentence"]].to_csv("train."+source_language, header=False, index=False)  # Herman: Added `header=False` everywhere
#stripped[["target_sentence"]].to_csv("train."+target_language, header=False, index=False)  # Julia: Problematic handling of quotation marks.

#dev[["source_sentence"]].to_csv("dev."+source_language, header=False, index=False)
#dev[["target_sentence"]].to_csv("dev."+target_language, header=False, index=False)

# Doublecheck the format below. There should be no extra quotation marks or weird characters.
! head train.*
! head dev.*

==> train.en <==
JEHOVAH’S servants highly esteem God’s own holy book , the Bible .
At noon , they slowly walk home .
To put the woman at ease , Jesus kindly said : “ Take courage , daughter ! ”
( Compare Ezekiel 28 : 17 . )
God’s active force is a very powerful source of comfort .
What can we do to make wise use of our freedom ?
Are you young or up in years ?
“ People in our area are becoming more and more radical , ” notes one traveling overseer .
This set Adam apart from the animals , since they live according to instinct .
This , of course , was no accident .

==> train.pcm <==
JEHOVAH people value Bible well well because dem know sey na God holy book .
Around twelve o’clock , they go waka sofri - sofri go house .
Jesus come tell am wetin go make am no fear again . E tell am sey : ‘ No fear , my pikin ! ’
( Still check Ezekiel 28 : 17 . )
This one na one way wey e take dey help us .
Wetin go help us use our freedom well ?
You dey young or you don old ?
One circuit overseer talk say



---


## Installation of JoeyNMT

JoeyNMT is a simple, minimalist NMT package which is useful for learning and teaching. Check out the documentation for JoeyNMT [here](https://joeynmt.readthedocs.io)  

In [12]:
# Install JoeyNMT
! git clone https://github.com/joeynmt/joeynmt.git
! cd joeynmt; pip3 install .

Cloning into 'joeynmt'...
remote: Enumerating objects: 93, done.[K
remote: Counting objects: 100% (93/93), done.[K
remote: Compressing objects: 100% (67/67), done.[K
remote: Total 2277 (delta 56), reused 47 (delta 26), pack-reused 2184[K
Receiving objects: 100% (2277/2277), 2.63 MiB | 5.06 MiB/s, done.
Resolving deltas: 100% (1577/1577), done.
Processing /home/ec2-user/SageMaker/masakhane/joeynmt
Collecting torch>=1.1 (from joeynmt==0.0.1)
[?25l  Downloading https://files.pythonhosted.org/packages/24/19/4804aea17cd136f1705a5e98a00618cb8f6ccc375ad8bfa437408e09d058/torch-1.4.0-cp36-cp36m-manylinux1_x86_64.whl (753.4MB)
[K    100% |████████████████████████████████| 753.4MB 66kB/s  eta 0:00:01    26% |████████▌                       | 200.0MB 56.3MB/s eta 0:00:10    35% |███████████▎                    | 264.9MB 58.5MB/s eta 0:00:09    50% |████████████████▏               | 380.8MB 57.0MB/s eta 0:00:07    66% |█████████████████████▏          | 498.3MB 55.6MB/s eta 0:00:05    72% |███

Collecting portalocker (from sacrebleu>=1.3.6->joeynmt==0.0.1)
  Downloading https://files.pythonhosted.org/packages/91/db/7bc703c0760df726839e0699b7f78a4d8217fdc9c7fcb1b51b39c5a22a4e/portalocker-1.5.2-py2.py3-none-any.whl
Collecting typing (from sacrebleu>=1.3.6->joeynmt==0.0.1)
  Downloading https://files.pythonhosted.org/packages/fe/2e/b480ee1b75e6d17d2993738670e75c1feeb9ff7f64452153cf018051cc92/typing-3.7.4.1-py3-none-any.whl
Collecting h5py (from keras-applications>=1.0.8->tensorflow>=1.14->joeynmt==0.0.1)
[?25l  Downloading https://files.pythonhosted.org/packages/60/06/cafdd44889200e5438b897388f3075b52a8ef01f28a17366d91de0fa2d05/h5py-2.10.0-cp36-cp36m-manylinux1_x86_64.whl (2.9MB)
[K    100% |████████████████████████████████| 2.9MB 16.4MB/s ta 0:00:01
[?25hCollecting werkzeug>=0.11.15 (from tensorboard<2.2.0,>=2.1.0->tensorflow>=1.14->joeynmt==0.0.1)
[?25l  Downloading https://files.pythonhosted.org/packages/ba/a5/d6f8a6e71f15364d35678a4ec8a0186f980b3bd2545f40ad51dd26a87fb1/W

# Preprocessing the Data into Subword BPE Tokens

- One of the most powerful improvements for agglutinative languages (a feature of most Bantu languages) is using BPE tokenization [ (Sennrich, 2015) ](https://arxiv.org/abs/1508.07909).

- It was also shown that by optimizing the umber of BPE codes we significantly improve results for low-resourced languages [(Sennrich, 2019)](https://www.aclweb.org/anthology/P19-1021) [(Martinus, 2019)](https://arxiv.org/abs/1906.05685)

- Below we have the scripts for doing BPE tokenization of our data. We use 4000 tokens as recommended by [(Sennrich, 2019)](https://www.aclweb.org/anthology/P19-1021). You do not need to change anything. Simply running the below will be suitable. 

In [13]:
!sudo pip3 install subword-nmt

Collecting subword-nmt
  Downloading https://files.pythonhosted.org/packages/74/60/6600a7bc09e7ab38bc53a48a20d8cae49b837f93f5842a41fe513a694912/subword_nmt-0.3.7-py2.py3-none-any.whl
Installing collected packages: subword-nmt
[33m  The script subword-nmt is installed in '/usr/local/bin' which is not on PATH.
Successfully installed subword-nmt-0.3.7
[33mYou are using pip version 19.0.2, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [14]:
# One of the huge boosts in NMT performance was to use a different method of tokenizing. 
# Usually, NMT would tokenize by words. However, using a method called BPE gave amazing boosts to performance

# Do subword NMT
from os import path
os.environ["src"] = source_language # Sets them in bash as well, since we often use bash scripts
os.environ["tgt"] = target_language

# Learn BPEs on the training data.
os.environ["data_path"] = path.join("joeynmt", "data", source_language + target_language) # Herman! 
! subword-nmt learn-joint-bpe-and-vocab --input train.$src train.$tgt -s 4000 -o bpe.codes.4000 --write-vocabulary vocab.$src vocab.$tgt

# Apply BPE splits to the development and test data.
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$src < train.$src > train.bpe.$src
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$tgt < train.$tgt > train.bpe.$tgt

! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$src < dev.$src > dev.bpe.$src
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$tgt < dev.$tgt > dev.bpe.$tgt
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$src < test.$src > test.bpe.$src
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$tgt < test.$tgt > test.bpe.$tgt

# Create directory, move everyone we care about to the correct location
! mkdir -p $data_path
! cp train.* $data_path
! cp test.* $data_path
! cp dev.* $data_path
! cp bpe.codes.4000 $data_path
! ls $data_path

# Also move everything we care about to a mounted location in google drive (relevant if running in colab) at gdrive_path
! cp train.* "$experiment_path"
! cp test.* "$experiment_path"
! cp dev.* "$experiment_path"
! cp bpe.codes.4000 "$gexperiment_path"
! ls "$experiment_path"

# Create that vocab using build_vocab
! sudo chmod 777 joeynmt/scripts/build_vocab.py
! joeynmt/scripts/build_vocab.py joeynmt/data/$src$tgt/train.bpe.$src joeynmt/data/$src$tgt/train.bpe.$tgt --output_path joeynmt/data/$src$tgt/vocab.txt

# Some output
! echo "BPE Pidgin Sentences"
! tail -n 5 test.bpe.$tgt
! echo "Combined BPE Vocab"
! tail -n 10 joeynmt/data/$src$tgt/vocab.txt  # Herman

bpe.codes.4000	dev.en	     test.bpe.pcm    test.pcm	    train.en
dev.bpe.en	dev.pcm      test.en	     train.bpe.en   train.pcm
dev.bpe.pcm	test.bpe.en  test.en-any.en  train.bpe.pcm
cp: cannot create regular file ‘’: No such file or directory
dev.bpe.en   dev.pcm	   test.en	   train.bpe.en   train.pcm
dev.bpe.pcm  test.bpe.en   test.en-any.en  train.bpe.pcm
dev.en	     test.bpe.pcm  test.pcm	   train.en
BPE Pidgin Sentences
( b ) Wetin you go get for mind sey you no go do ?
book . “ Na wetin I learn for Proverbs 27 : ​ 11 , Matthew 26 : ​ 52 , and John 13 : ​ 35 , give me mind wey mek I no join army .
And even when the wahala come , na wetin dey for this Bible verse dem , still help me fit bear the wahala . ” ​ — An@@ dri@@ y , wey come from U@@ k@@ ra@@ ine .
“ Na wetin dey Isaiah 2 : 4 help me mek I no follow fight war , even when dem want mek I join .
I just dey think for my mind how trouble no go dey the new world .
Combined BPE Vocab
ʺ
satisfac@@
righte@@
ʼ@@
selves
fts
circu@@
ga

# Creating the JoeyNMT Config

JoeyNMT requires a yaml config. We provide a template below. We've also set a number of defaults with it, that you may play with!

- We used Transformer architecture 
- We set our dropout to reasonably high: 0.3 (recommended in  [(Sennrich, 2019)](https://www.aclweb.org/anthology/P19-1021))

Things worth playing with:
- The batch size (also recommended to change for low-resourced languages)
- The number of epochs (we've set it at 30 just so it runs in about an hour, for testing purposes)
- The decoder options (beam_size, alpha)
- Evaluation metrics (BLEU versus Crhf4)

In [None]:
# This creates the config file for our JoeyNMT system. It might seem overwhelming so we've provided a couple of useful parameters you'll need to update
# (You can of course play with all the parameters if you'd like!)

name = '%s%s' % (source_language, target_language)
gdrive_path = os.environ["experiment_path"]

# Create the config
config = """
name: "{name}_transformer"

data:
    src: "{source_language}"
    trg: "{target_language}"
    train: "data/{name}/train.bpe"
    dev:   "data/{name}/dev.bpe"
    test:  "data/{name}/test.bpe"
    level: "bpe"
    lowercase: False
    max_sent_length: 100
    src_vocab: "data/{name}/vocab.txt"
    trg_vocab: "data/{name}/vocab.txt"

testing:
    beam_size: 5
    alpha: 1.0

training:
    #load_model: "{experiment_path}/models/{name}_transformer/1.ckpt" # if uncommented, load a pre-trained model from this checkpoint
    random_seed: 42
    optimizer: "adam"
    normalization: "tokens"
    adam_betas: [0.9, 0.999] 
    scheduling: "plateau"           # TODO: try switching from plateau to Noam scheduling
    patience: 5                     # For plateau: decrease learning rate by decrease_factor if validation score has not improved for this many validation rounds.
    learning_rate_factor: 0.5       # factor for Noam scheduler (used with Transformer)
    learning_rate_warmup: 1000      # warmup steps for Noam scheduler (used with Transformer)
    decrease_factor: 0.7
    loss: "crossentropy"
    learning_rate: 0.0003
    learning_rate_min: 0.00000001
    weight_decay: 0.0
    label_smoothing: 0.1
    batch_size: 4096
    batch_type: "token"
    eval_batch_size: 3600
    eval_batch_type: "token"
    batch_multiplier: 1
    early_stopping_metric: "ppl"
    epochs: 200                     # TODO: Decrease for when playing around and checking of working. Around 30 is sufficient to check if its working at all
    validation_freq: 1000          # TODO: Set to at least once per epoch.
    logging_freq: 100
    eval_metric: "bleu"
    model_dir: "models/{name}_transformer"
    overwrite: True               # TODO: Set to True if you want to overwrite possibly existing models. 
    shuffle: True
    use_cuda: True
    max_output_length: 100
    print_valid_sents: [0, 1, 2, 3]
    keep_last_ckpts: 5

model:
    initializer: "xavier"
    bias_initializer: "zeros"
    init_gain: 1.0
    embed_initializer: "xavier"
    embed_init_gain: 1.0
    tied_embeddings: True
    tied_softmax: True
    encoder:
        type: "transformer"
        num_layers: 6
        num_heads: 4             # TODO: Increase to 8 for larger data.
        embeddings:
            embedding_dim: 256   # TODO: Increase to 512 for larger data.
            scale: True
            dropout: 0.2
        # typically ff_size = 4 x hidden_size
        hidden_size: 256         # TODO: Increase to 512 for larger data.
        ff_size: 1024            # TODO: Increase to 2048 for larger data.
        dropout: 0.3
    decoder:
        type: "transformer"
        num_layers: 6
        num_heads: 4              # TODO: Increase to 8 for larger data.
        embeddings:
            embedding_dim: 256    # TODO: Increase to 512 for larger data.
            scale: True
            dropout: 0.2
        # typically ff_size = 4 x hidden_size
        hidden_size: 256         # TODO: Increase to 512 for larger data.
        ff_size: 1024            # TODO: Increase to 2048 for larger data.
        dropout: 0.3
""".format(name=name, experiment_path=os.environ["experiment_path"], source_language=source_language, target_language=target_language)
with open("joeynmt/configs/transformer_{name}.yaml".format(name=name),'w') as f:
    f.write(config)

# Train the Model

This single line of joeynmt runs the training using the config we made above

##### Extra Installs
Because I am running on AWS, I need to do some extra installs because of environment conflicts. 

In [16]:
!conda install pytorch torchvision cudatoolkit=10.1 -c pytorch --yes

Solving environment: done


  current version: 4.5.12
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/ec2-user/anaconda3/envs/python3

  added / updated specs: 
    - cudatoolkit=10.1
    - pytorch
    - torchvision


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.1.1   |                0         132 KB
    pytorch-1.4.0              |py3.6_cuda10.1.243_cudnn7.6.3_0       432.9 MB  pytorch
    openssl-1.0.2u             |       h7b6447c_0         3.1 MB
    certifi-2019.11.28         |           py36_0         156 KB
    torchvision-0.5.0          |       py36_cu101         9.1 MB  pytorch
    ------------------------------------------------------------
                                           Total:       445.4 MB

The following NEW packages will be INSTALLE

In [17]:
!conda install tensorboard --yes

Solving environment: done


  current version: 4.5.12
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/ec2-user/anaconda3/envs/python3

  added / updated specs: 
    - tensorboard


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    c-ares-1.15.0              |    h7b6447c_1001         102 KB
    absl-py-0.8.1              |           py36_0         161 KB
    markdown-3.1.1             |           py36_0         113 KB
    numpy-1.18.1               |   py36h4f9e942_0           5 KB
    scikit-learn-0.22.1        |   py36hd81dba3_0         7.1 MB
    tensorboard-2.0.0          |     pyhb38c66f_1         3.3 MB
    mkl-2020.0                 |              166       202.1 MB
    scipy-1.4.1                |   py36h0b6359f_0        18.9 MB
    numpy-base-1.18.1          |   py36hde5b4d6_

In [18]:
!conda install -c pytorch torchtext --yes

Solving environment: done


  current version: 4.5.12
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/ec2-user/anaconda3/envs/python3

  added / updated specs: 
    - torchtext


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    tqdm-4.42.0                |             py_0          56 KB
    torchtext-0.5.0            |             py_1         1.4 MB  pytorch
    ------------------------------------------------------------
                                           Total:         1.5 MB

The following NEW packages will be INSTALLED:

    torchtext: 0.5.0-py_1  pytorch
    tqdm:      4.42.0-py_0        


Downloading and Extracting Packages
tqdm-4.42.0          | 56 KB     | ##################################### | 100% 
torchtext-0.5.0      | 1.4 MB    | #############################

In [19]:
!conda install -c powerai sentencepiece --yes

Solving environment: done


  current version: 4.5.12
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/ec2-user/anaconda3/envs/python3

  added / updated specs: 
    - sentencepiece


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    astor-0.8.0                |           py36_0          45 KB
    tensorflow-2.0.0           |mkl_py36hef7ec59_0           3 KB
    sentencepiece-0.1.84       |   py36h6bb024c_0         3.1 MB  powerai
    keras-applications-1.0.8   |             py_0          33 KB
    tensorflow-base-2.0.0      |mkl_py36h9204916_0       100.9 MB
    gast-0.2.2                 |           py36_0         138 KB
    google-pasta-0.1.8         |             py_0          43 KB
    opt_einsum-3.1.0           |             py_0          54 KB
    _tflow_select-2.3.0        |   

In [20]:
!conda install -c powerai sacrebleu --yes

Solving environment: done


  current version: 4.5.12
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/ec2-user/anaconda3/envs/python3

  added / updated specs: 
    - sacrebleu


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    sacrebleu-1.4.3            |             py_0          34 KB  powerai
    portalocker-1.5.2          |           py36_0          20 KB
    ------------------------------------------------------------
                                           Total:          54 KB

The following NEW packages will be INSTALLED:

    portalocker: 1.5.2-py36_0        
    sacrebleu:   1.4.3-py_0   powerai


Downloading and Extracting Packages
sacrebleu-1.4.3      | 34 KB     | ##################################### | 100% 
portalocker-1.5.2    | 20 KB     | #######################

##### Train the model

In [None]:
# Train the model
# You can press Ctrl-C to stop. And then run the next cell to save your checkpoints! 
!cd joeynmt; python3 -m joeynmt train configs/transformer_$src$tgt.yaml

2020-02-09 13:08:37,412 Hello! This is Joey-NMT.
2020-02-09 13:08:40,281 Total params: 12099840
2020-02-09 13:08:40,283 Trainable parameters: ['decoder.layer_norm.bias', 'decoder.layer_norm.weight', 'decoder.layers.0.dec_layer_norm.bias', 'decoder.layers.0.dec_layer_norm.weight', 'decoder.layers.0.feed_forward.layer_norm.bias', 'decoder.layers.0.feed_forward.layer_norm.weight', 'decoder.layers.0.feed_forward.pwff_layer.0.bias', 'decoder.layers.0.feed_forward.pwff_layer.0.weight', 'decoder.layers.0.feed_forward.pwff_layer.3.bias', 'decoder.layers.0.feed_forward.pwff_layer.3.weight', 'decoder.layers.0.src_trg_att.k_layer.bias', 'decoder.layers.0.src_trg_att.k_layer.weight', 'decoder.layers.0.src_trg_att.output_layer.bias', 'decoder.layers.0.src_trg_att.output_layer.weight', 'decoder.layers.0.src_trg_att.q_layer.bias', 'decoder.layers.0.src_trg_att.q_layer.weight', 'decoder.layers.0.src_trg_att.v_layer.bias', 'decoder.layers.0.src_trg_att.v_layer.weight', 'decoder.layers.0.trg_trg_att.k_l

2020-02-09 13:08:44,832 cfg.name                           : enpcm_transformer
2020-02-09 13:08:44,832 cfg.data.src                       : en
2020-02-09 13:08:44,832 cfg.data.trg                       : pcm
2020-02-09 13:08:44,832 cfg.data.train                     : data/enpcm/train.bpe
2020-02-09 13:08:44,832 cfg.data.dev                       : data/enpcm/dev.bpe
2020-02-09 13:08:44,832 cfg.data.test                      : data/enpcm/test.bpe
2020-02-09 13:08:44,832 cfg.data.level                     : bpe
2020-02-09 13:08:44,832 cfg.data.lowercase                 : False
2020-02-09 13:08:44,832 cfg.data.max_sent_length           : 100
2020-02-09 13:08:44,832 cfg.data.src_vocab                 : data/enpcm/vocab.txt
2020-02-09 13:08:44,833 cfg.data.trg_vocab                 : data/enpcm/vocab.txt
2020-02-09 13:08:44,833 cfg.testing.beam_size              : 5
2020-02-09 13:08:44,833 cfg.testing.alpha                  : 1.0
2020-02-09 13:08:44,833 cfg.training.random_seed           :

2020-02-09 13:10:51,461 Epoch   5: total training loss 712.52
2020-02-09 13:10:51,461 EPOCH 6
2020-02-09 13:10:53,275 Epoch   6 Step:     1100 Batch Loss:     3.265706 Tokens per Sec:    21850, Lr: 0.000300
2020-02-09 13:11:03,098 Epoch   6 Step:     1200 Batch Loss:     3.291728 Tokens per Sec:    21540, Lr: 0.000300
2020-02-09 13:11:12,990 Epoch   6 Step:     1300 Batch Loss:     2.943817 Tokens per Sec:    20834, Lr: 0.000300
2020-02-09 13:11:12,991 Epoch   6: total training loss 684.58
2020-02-09 13:11:12,991 EPOCH 7
2020-02-09 13:11:22,883 Epoch   7 Step:     1400 Batch Loss:     2.914759 Tokens per Sec:    21785, Lr: 0.000300
2020-02-09 13:11:32,637 Epoch   7 Step:     1500 Batch Loss:     3.037468 Tokens per Sec:    21424, Lr: 0.000300
2020-02-09 13:11:34,103 Epoch   7: total training loss 649.63
2020-02-09 13:11:34,104 EPOCH 8
2020-02-09 13:11:42,433 Epoch   8 Step:     1600 Batch Loss:     2.900786 Tokens per Sec:    21571, Lr: 0.000300
2020-02-09 13:11:52,268 Epoch   8 Step: 

2020-02-09 13:15:15,027 EPOCH 17
2020-02-09 13:15:19,493 Epoch  17 Step:     3500 Batch Loss:     2.162553 Tokens per Sec:    21523, Lr: 0.000300
2020-02-09 13:15:29,371 Epoch  17 Step:     3600 Batch Loss:     2.326906 Tokens per Sec:    21915, Lr: 0.000300
2020-02-09 13:15:36,243 Epoch  17: total training loss 491.85
2020-02-09 13:15:36,244 EPOCH 18
2020-02-09 13:15:39,246 Epoch  18 Step:     3700 Batch Loss:     2.380447 Tokens per Sec:    21543, Lr: 0.000300
2020-02-09 13:15:49,155 Epoch  18 Step:     3800 Batch Loss:     2.020103 Tokens per Sec:    21464, Lr: 0.000300
2020-02-09 13:15:57,601 Epoch  18: total training loss 487.15
2020-02-09 13:15:57,602 EPOCH 19
2020-02-09 13:15:59,043 Epoch  19 Step:     3900 Batch Loss:     2.084379 Tokens per Sec:    19882, Lr: 0.000300
2020-02-09 13:16:08,997 Epoch  19 Step:     4000 Batch Loss:     2.317714 Tokens per Sec:    21471, Lr: 0.000300
2020-02-09 13:16:19,854 Hooray! New best validation result [ppl]!
2020-02-09 13:16:19,855 Saving ne

2020-02-09 13:19:51,539 Epoch  28 Step:     6000 Batch Loss:     2.292132 Tokens per Sec:    22003, Lr: 0.000300
2020-02-09 13:20:03,538 Hooray! New best validation result [ppl]!
2020-02-09 13:20:03,539 Saving new checkpoint.
2020-02-09 13:20:03,726 Example #0
2020-02-09 13:20:03,726 	Source:     Jehovah did not do that , but he allowed Satan to test Job , stating : “ Everything that he has is in your hand . ”
2020-02-09 13:20:03,726 	Reference:  E tell am sey : ‘ Everything wey e get dey your hand . ’
2020-02-09 13:20:03,726 	Hypothesis: Jehovah no let Job do wetin e do . But e no do wetin e do . E sey : ‘ Na you be the God wey dey do you . ’
2020-02-09 13:20:03,726 Example #1
2020-02-09 13:20:03,726 	Source:     Corinna said : “ We left our work area in the evening and walked to a railway station 25 kilometers ( 15 miles ) away .
2020-02-09 13:20:03,727 	Reference:  Corinna talk sey : “ We comot for where we dey work for evening come trek go where people dey enter train wey be 25 kil

2020-02-09 13:23:44,333 Epoch  37: total training loss 372.70
2020-02-09 13:23:44,333 EPOCH 38
2020-02-09 13:23:54,141 Epoch  38 Step:     8100 Batch Loss:     1.849720 Tokens per Sec:    21401, Lr: 0.000300
2020-02-09 13:24:04,016 Epoch  38 Step:     8200 Batch Loss:     1.739787 Tokens per Sec:    21653, Lr: 0.000300
2020-02-09 13:24:05,685 Epoch  38: total training loss 367.20
2020-02-09 13:24:05,685 EPOCH 39
2020-02-09 13:24:13,938 Epoch  39 Step:     8300 Batch Loss:     1.978657 Tokens per Sec:    21301, Lr: 0.000300
2020-02-09 13:24:23,831 Epoch  39 Step:     8400 Batch Loss:     1.544337 Tokens per Sec:    21534, Lr: 0.000300
2020-02-09 13:24:27,069 Epoch  39: total training loss 363.03
2020-02-09 13:24:27,069 EPOCH 40
2020-02-09 13:24:33,676 Epoch  40 Step:     8500 Batch Loss:     1.669919 Tokens per Sec:    21793, Lr: 0.000300
2020-02-09 13:24:43,529 Epoch  40 Step:     8600 Batch Loss:     0.820153 Tokens per Sec:    21331, Lr: 0.000300
2020-02-09 13:24:48,366 Epoch  40: to

2020-02-09 13:28:16,315 Epoch  49 Step:    10500 Batch Loss:     1.630366 Tokens per Sec:    21649, Lr: 0.000300
2020-02-09 13:28:25,449 Epoch  49: total training loss 325.25
2020-02-09 13:28:25,450 EPOCH 50
2020-02-09 13:28:26,177 Epoch  50 Step:    10600 Batch Loss:     1.349531 Tokens per Sec:    20812, Lr: 0.000300
2020-02-09 13:28:35,985 Epoch  50 Step:    10700 Batch Loss:     1.566842 Tokens per Sec:    21428, Lr: 0.000300
2020-02-09 13:28:45,809 Epoch  50 Step:    10800 Batch Loss:     1.492050 Tokens per Sec:    21841, Lr: 0.000300
2020-02-09 13:28:46,590 Epoch  50: total training loss 321.43
2020-02-09 13:28:46,590 EPOCH 51
2020-02-09 13:28:55,623 Epoch  51 Step:    10900 Batch Loss:     1.727898 Tokens per Sec:    21534, Lr: 0.000300
2020-02-09 13:29:05,462 Epoch  51 Step:    11000 Batch Loss:     1.638900 Tokens per Sec:    21320, Lr: 0.000300
2020-02-09 13:29:16,137 Example #0
2020-02-09 13:29:16,137 	Source:     Jehovah did not do that , but he allowed Satan to test Job ,

2020-02-09 13:33:01,390 Example #0
2020-02-09 13:33:01,390 	Source:     Jehovah did not do that , but he allowed Satan to test Job , stating : “ Everything that he has is in your hand . ”
2020-02-09 13:33:01,390 	Reference:  E tell am sey : ‘ Everything wey e get dey your hand . ’
2020-02-09 13:33:01,390 	Hypothesis: Jehovah no let Satan do wetin e no like . E sey : ‘ Everything wey Satan dey do , na im be the true God wey dey give you hand . ’
2020-02-09 13:33:01,390 Example #1
2020-02-09 13:33:01,391 	Source:     Corinna said : “ We left our work area in the evening and walked to a railway station 25 kilometers ( 15 miles ) away .
2020-02-09 13:33:01,391 	Reference:  Corinna talk sey : “ We comot for where we dey work for evening come trek go where people dey enter train wey be 25 kilometer ( 15 miles ) from where the farm dey .
2020-02-09 13:33:01,391 	Hypothesis: Corinna come talk sey : “ We comot for where we dey stay for the prison , and we dey stay prison for 25 miles ( 15 km ) 

2020-02-09 13:36:57,564 Epoch  70 Step:    15100 Batch Loss:     1.466820 Tokens per Sec:    21558, Lr: 0.000300
2020-02-09 13:37:00,923 Epoch  70: total training loss 268.52
2020-02-09 13:37:00,923 EPOCH 71
2020-02-09 13:37:07,505 Epoch  71 Step:    15200 Batch Loss:     1.100581 Tokens per Sec:    21126, Lr: 0.000300
2020-02-09 13:37:17,430 Epoch  71 Step:    15300 Batch Loss:     1.263563 Tokens per Sec:    21414, Lr: 0.000300
2020-02-09 13:37:22,304 Epoch  71: total training loss 266.92
2020-02-09 13:37:22,304 EPOCH 72
2020-02-09 13:37:27,420 Epoch  72 Step:    15400 Batch Loss:     1.193709 Tokens per Sec:    20689, Lr: 0.000300
2020-02-09 13:37:37,383 Epoch  72 Step:    15500 Batch Loss:     0.802885 Tokens per Sec:    21147, Lr: 0.000300
2020-02-09 13:37:43,831 Epoch  72: total training loss 266.82
2020-02-09 13:37:43,832 EPOCH 73
2020-02-09 13:37:47,366 Epoch  73 Step:    15600 Batch Loss:     0.445664 Tokens per Sec:    21176, Lr: 0.000300
2020-02-09 13:37:57,363 Epoch  73 Ste

2020-02-09 13:41:21,836 Epoch  81: total training loss 241.66
2020-02-09 13:41:21,837 EPOCH 82
2020-02-09 13:41:30,702 Epoch  82 Step:    17600 Batch Loss:     1.071036 Tokens per Sec:    21079, Lr: 0.000210
2020-02-09 13:41:40,662 Epoch  82 Step:    17700 Batch Loss:     1.366563 Tokens per Sec:    21398, Lr: 0.000210
2020-02-09 13:41:43,343 Epoch  82: total training loss 239.91
2020-02-09 13:41:43,343 EPOCH 83
2020-02-09 13:41:50,607 Epoch  83 Step:    17800 Batch Loss:     0.447479 Tokens per Sec:    20887, Lr: 0.000210
2020-02-09 13:42:00,555 Epoch  83 Step:    17900 Batch Loss:     1.230339 Tokens per Sec:    21363, Lr: 0.000210
2020-02-09 13:42:04,916 Epoch  83: total training loss 239.55
2020-02-09 13:42:04,916 EPOCH 84
2020-02-09 13:42:10,509 Epoch  84 Step:    18000 Batch Loss:     1.350716 Tokens per Sec:    21147, Lr: 0.000210
2020-02-09 13:42:22,835 Example #0
2020-02-09 13:42:22,836 	Source:     Jehovah did not do that , but he allowed Satan to test Job , stating : “ Every

2020-02-09 13:46:07,578 Example #0
2020-02-09 13:46:07,578 	Source:     Jehovah did not do that , but he allowed Satan to test Job , stating : “ Everything that he has is in your hand . ”
2020-02-09 13:46:07,578 	Reference:  E tell am sey : ‘ Everything wey e get dey your hand . ’
2020-02-09 13:46:07,578 	Hypothesis: Jehovah no let Satan suffer Job . But e let Satan do wetin e talk . E sey : ‘ Anybody wey get for hand , e don use im hand do wetin e want . ’
2020-02-09 13:46:07,578 Example #1
2020-02-09 13:46:07,578 	Source:     Corinna said : “ We left our work area in the evening and walked to a railway station 25 kilometers ( 15 miles ) away .
2020-02-09 13:46:07,578 	Reference:  Corinna talk sey : “ We comot for where we dey work for evening come trek go where people dey enter train wey be 25 kilometer ( 15 miles ) from where the farm dey .
2020-02-09 13:46:07,579 	Hypothesis: Corinna , wey come from prison , talk sey : “ We comot for where we dey stay , and we waka comot go where w

2020-02-09 13:49:56,998 Epoch 102: total training loss 213.19
2020-02-09 13:49:56,998 EPOCH 103
2020-02-09 13:50:01,802 Epoch 103 Step:    22100 Batch Loss:     1.089305 Tokens per Sec:    21327, Lr: 0.000147
2020-02-09 13:50:11,727 Epoch 103 Step:    22200 Batch Loss:     0.896127 Tokens per Sec:    21282, Lr: 0.000147
2020-02-09 13:50:18,547 Epoch 103: total training loss 207.89
2020-02-09 13:50:18,547 EPOCH 104
2020-02-09 13:50:21,751 Epoch 104 Step:    22300 Batch Loss:     0.938026 Tokens per Sec:    20639, Lr: 0.000147
2020-02-09 13:50:31,744 Epoch 104 Step:    22400 Batch Loss:     1.021022 Tokens per Sec:    21276, Lr: 0.000147
2020-02-09 13:50:40,010 Epoch 104: total training loss 205.96
2020-02-09 13:50:40,011 EPOCH 105
2020-02-09 13:50:41,742 Epoch 105 Step:    22500 Batch Loss:     0.934909 Tokens per Sec:    21536, Lr: 0.000147
2020-02-09 13:50:51,699 Epoch 105 Step:    22600 Batch Loss:     1.191996 Tokens per Sec:    21250, Lr: 0.000147
2020-02-09 13:51:01,456 Epoch 105:

2020-02-09 13:54:24,836 Epoch 114 Step:    24500 Batch Loss:     1.077271 Tokens per Sec:    21529, Lr: 0.000147
2020-02-09 13:54:34,761 Epoch 114 Step:    24600 Batch Loss:     0.907009 Tokens per Sec:    21087, Lr: 0.000147
2020-02-09 13:54:39,311 Epoch 114: total training loss 196.42
2020-02-09 13:54:39,311 EPOCH 115
2020-02-09 13:54:44,712 Epoch 115 Step:    24700 Batch Loss:     0.819712 Tokens per Sec:    21185, Lr: 0.000147
2020-02-09 13:54:54,651 Epoch 115 Step:    24800 Batch Loss:     0.818831 Tokens per Sec:    21395, Lr: 0.000147
2020-02-09 13:55:00,797 Epoch 115: total training loss 196.22
2020-02-09 13:55:00,798 EPOCH 116
2020-02-09 13:55:04,582 Epoch 116 Step:    24900 Batch Loss:     1.006718 Tokens per Sec:    21151, Lr: 0.000147
2020-02-09 13:55:14,478 Epoch 116 Step:    25000 Batch Loss:     1.004069 Tokens per Sec:    21288, Lr: 0.000147
2020-02-09 13:55:26,365 Example #0
2020-02-09 13:55:26,366 	Source:     Jehovah did not do that , but he allowed Satan to test Job

2020-02-09 13:58:58,405 Epoch 125 Step:    27000 Batch Loss:     0.889066 Tokens per Sec:    21171, Lr: 0.000147
2020-02-09 13:59:11,081 Example #0
2020-02-09 13:59:11,081 	Source:     Jehovah did not do that , but he allowed Satan to test Job , stating : “ Everything that he has is in your hand . ”
2020-02-09 13:59:11,081 	Reference:  E tell am sey : ‘ Everything wey e get dey your hand . ’
2020-02-09 13:59:11,081 	Hypothesis: Jehovah no let Satan do wetin e no like . E tell Job sey : ‘ Anybody wey dey your hand don weak . ’
2020-02-09 13:59:11,082 Example #1
2020-02-09 13:59:11,082 	Source:     Corinna said : “ We left our work area in the evening and walked to a railway station 25 kilometers ( 15 miles ) away .
2020-02-09 13:59:11,082 	Reference:  Corinna talk sey : “ We comot for where we dey work for evening come trek go where people dey enter train wey be 25 kilometer ( 15 miles ) from where the farm dey .
2020-02-09 13:59:11,082 	Hypothesis: Corinna , wey come from prison , talk

2020-02-09 14:03:06,791 Epoch 135 Step:    29100 Batch Loss:     0.807884 Tokens per Sec:    21170, Lr: 0.000103
2020-02-09 14:03:14,881 Epoch 135: total training loss 178.83
2020-02-09 14:03:14,881 EPOCH 136
2020-02-09 14:03:16,803 Epoch 136 Step:    29200 Batch Loss:     0.718593 Tokens per Sec:    21128, Lr: 0.000103
2020-02-09 14:03:26,802 Epoch 136 Step:    29300 Batch Loss:     0.791574 Tokens per Sec:    20882, Lr: 0.000103
2020-02-09 14:03:36,667 Epoch 136: total training loss 178.04
2020-02-09 14:03:36,667 EPOCH 137
2020-02-09 14:03:36,811 Epoch 137 Step:    29400 Batch Loss:     0.804677 Tokens per Sec:    16783, Lr: 0.000103
2020-02-09 14:03:46,736 Epoch 137 Step:    29500 Batch Loss:     0.883674 Tokens per Sec:    21302, Lr: 0.000103
2020-02-09 14:03:56,702 Epoch 137 Step:    29600 Batch Loss:     0.312471 Tokens per Sec:    21321, Lr: 0.000103
2020-02-09 14:03:58,207 Epoch 137: total training loss 176.31
2020-02-09 14:03:58,207 EPOCH 138
2020-02-09 14:04:06,750 Epoch 138 

2020-02-09 14:07:30,746 Epoch 146 Step:    31500 Batch Loss:     0.899355 Tokens per Sec:    21194, Lr: 0.000103
2020-02-09 14:07:36,292 Epoch 146: total training loss 171.23
2020-02-09 14:07:36,292 EPOCH 147
2020-02-09 14:07:40,723 Epoch 147 Step:    31600 Batch Loss:     0.714567 Tokens per Sec:    20765, Lr: 0.000103
2020-02-09 14:07:50,675 Epoch 147 Step:    31700 Batch Loss:     0.963298 Tokens per Sec:    21338, Lr: 0.000103
2020-02-09 14:07:57,909 Epoch 147: total training loss 171.62
2020-02-09 14:07:57,909 EPOCH 148
2020-02-09 14:08:00,636 Epoch 148 Step:    31800 Batch Loss:     0.639810 Tokens per Sec:    21075, Lr: 0.000103
2020-02-09 14:08:10,523 Epoch 148 Step:    31900 Batch Loss:     0.846164 Tokens per Sec:    21358, Lr: 0.000103
2020-02-09 14:08:19,535 Epoch 148: total training loss 171.46
2020-02-09 14:08:19,535 EPOCH 149
2020-02-09 14:08:20,573 Epoch 149 Step:    32000 Batch Loss:     0.518559 Tokens per Sec:    20187, Lr: 0.000103
2020-02-09 14:08:32,965 Example #0

2020-02-09 14:11:57,207 Epoch 157: total training loss 166.36
2020-02-09 14:11:57,207 EPOCH 158
2020-02-09 14:12:03,045 Epoch 158 Step:    34000 Batch Loss:     0.839570 Tokens per Sec:    21424, Lr: 0.000103
2020-02-09 14:12:15,485 Example #0
2020-02-09 14:12:15,485 	Source:     Jehovah did not do that , but he allowed Satan to test Job , stating : “ Everything that he has is in your hand . ”
2020-02-09 14:12:15,485 	Reference:  E tell am sey : ‘ Everything wey e get dey your hand . ’
2020-02-09 14:12:15,485 	Hypothesis: Jehovah no let Satan do wetin e no like . E tell Job sey : ‘ Anybody wey dey do wetin e want , na im be your hand . ’
2020-02-09 14:12:15,486 Example #1
2020-02-09 14:12:15,486 	Source:     Corinna said : “ We left our work area in the evening and walked to a railway station 25 kilometers ( 15 miles ) away .
2020-02-09 14:12:15,486 	Reference:  Corinna talk sey : “ We comot for where we dey work for evening come trek go where people dey enter train wey be 25 kilometer

2020-02-09 14:16:08,591 Epoch 167 Step:    36100 Batch Loss:     0.651248 Tokens per Sec:    21533, Lr: 0.000072
2020-02-09 14:16:08,987 Epoch 167: total training loss 161.08
2020-02-09 14:16:08,988 EPOCH 168
2020-02-09 14:16:18,537 Epoch 168 Step:    36200 Batch Loss:     0.704232 Tokens per Sec:    21435, Lr: 0.000072
2020-02-09 14:16:28,401 Epoch 168 Step:    36300 Batch Loss:     0.684729 Tokens per Sec:    21561, Lr: 0.000072
2020-02-09 14:16:30,265 Epoch 168: total training loss 159.60
2020-02-09 14:16:30,265 EPOCH 169
2020-02-09 14:16:38,278 Epoch 169 Step:    36400 Batch Loss:     0.832892 Tokens per Sec:    21379, Lr: 0.000072
2020-02-09 14:16:48,172 Epoch 169 Step:    36500 Batch Loss:     0.603741 Tokens per Sec:    21275, Lr: 0.000072
2020-02-09 14:16:51,618 Epoch 169: total training loss 160.18
2020-02-09 14:16:51,618 EPOCH 170
2020-02-09 14:16:58,056 Epoch 170 Step:    36600 Batch Loss:     0.663849 Tokens per Sec:    21777, Lr: 0.000072
2020-02-09 14:17:07,895 Epoch 170 

2020-02-09 14:20:42,318 Epoch 179 Step:    38600 Batch Loss:     0.880630 Tokens per Sec:    21455, Lr: 0.000072
2020-02-09 14:20:51,587 Epoch 179: total training loss 158.21
2020-02-09 14:20:51,587 EPOCH 180
2020-02-09 14:20:52,223 Epoch 180 Step:    38700 Batch Loss:     0.715024 Tokens per Sec:    19638, Lr: 0.000072
2020-02-09 14:21:02,175 Epoch 180 Step:    38800 Batch Loss:     0.700444 Tokens per Sec:    21450, Lr: 0.000072
2020-02-09 14:21:12,087 Epoch 180 Step:    38900 Batch Loss:     0.672916 Tokens per Sec:    21168, Lr: 0.000072
2020-02-09 14:21:13,071 Epoch 180: total training loss 155.96
2020-02-09 14:21:13,072 EPOCH 181
2020-02-09 14:21:22,093 Epoch 181 Step:    39000 Batch Loss:     0.338834 Tokens per Sec:    21004, Lr: 0.000072
2020-02-09 14:21:34,349 Example #0
2020-02-09 14:21:34,349 	Source:     Jehovah did not do that , but he allowed Satan to test Job , stating : “ Everything that he has is in your hand . ”
2020-02-09 14:21:34,349 	Reference:  E tell am sey : ‘ 

2020-02-09 14:25:17,237 Example #0
2020-02-09 14:25:17,238 	Source:     Jehovah did not do that , but he allowed Satan to test Job , stating : “ Everything that he has is in your hand . ”
2020-02-09 14:25:17,238 	Reference:  E tell am sey : ‘ Everything wey e get dey your hand . ’
2020-02-09 14:25:17,238 	Hypothesis: Jehovah no let Satan do wetin e no like . E tell Job sey : ‘ Anybody wey dey do wetin e want , na im be the true God . ’
2020-02-09 14:25:17,238 Example #1
2020-02-09 14:25:17,238 	Source:     Corinna said : “ We left our work area in the evening and walked to a railway station 25 kilometers ( 15 miles ) away .
2020-02-09 14:25:17,238 	Reference:  Corinna talk sey : “ We comot for where we dey work for evening come trek go where people dey enter train wey be 25 kilometer ( 15 miles ) from where the farm dey .
2020-02-09 14:25:17,238 	Hypothesis: Corinna , e come sey : ‘ We comot for where we dey stay . We dey stay there for where we dey go . We dey go preach for there .
20

2020-02-09 14:29:02,928 Epoch 199: total training loss 150.54
2020-02-09 14:29:02,928 EPOCH 200
2020-02-09 14:29:09,841 Epoch 200 Step:    43100 Batch Loss:     0.733609 Tokens per Sec:    21135, Lr: 0.000050
2020-02-09 14:29:19,853 Epoch 200 Step:    43200 Batch Loss:     0.636086 Tokens per Sec:    21286, Lr: 0.000050
2020-02-09 14:29:24,549 Epoch 200: total training loss 149.87
2020-02-09 14:29:24,550 Training ended after 200 epochs.
2020-02-09 14:29:24,550 Best validation result (greedy) at step    10000:   8.84 ppl.
2020-02-09 14:29:39,550  dev bleu:  12.78 [Beam search decoding with beam size = 5 and alpha = 1.0]
2020-02-09 14:29:39,551 Translations saved to: models/enpcm_transformer/00010000.hyps.dev
2020-02-09 14:30:02,666 test bleu:  24.29 [Beam search decoding with beam size = 5 and alpha = 1.0]
2020-02-09 14:30:02,667 Translations saved to: models/enpcm_transformer/00010000.hyps.test


In [23]:
!mkdir -p "$experiment_path/models/${src}${tgt}_transformer/"

In [24]:
# Copy the created models from the notebook storage to google drive for persistant storage 
!cp -r joeynmt/models/${src}${tgt}_transformer/* "$experiment_path/models/${src}${tgt}_transformer/"

In [25]:
# Output our validation accuracy
! cat "$experiment_path/models/${src}${tgt}_transformer/validations.txt"

Steps: 1000	Loss: 73394.23438	PPL: 24.65521	bleu: 2.06072	LR: 0.00030000	*
Steps: 2000	Loss: 62560.94141	PPL: 15.36234	bleu: 4.45320	LR: 0.00030000	*
Steps: 3000	Loss: 56698.69531	PPL: 11.89271	bleu: 7.73914	LR: 0.00030000	*
Steps: 4000	Loss: 53580.67188	PPL: 10.37882	bleu: 8.94406	LR: 0.00030000	*
Steps: 5000	Loss: 51855.69531	PPL: 9.62574	bleu: 10.11587	LR: 0.00030000	*
Steps: 6000	Loss: 50861.65234	PPL: 9.21685	bleu: 11.02907	LR: 0.00030000	*
Steps: 7000	Loss: 50434.14062	PPL: 9.04638	bleu: 11.47610	LR: 0.00030000	*
Steps: 8000	Loss: 50059.23047	PPL: 8.89948	bleu: 12.05842	LR: 0.00030000	*
Steps: 9000	Loss: 50120.64844	PPL: 8.92338	bleu: 12.70072	LR: 0.00030000	
Steps: 10000	Loss: 49895.84375	PPL: 8.83621	bleu: 12.57659	LR: 0.00030000	*
Steps: 11000	Loss: 50398.80078	PPL: 9.03242	bleu: 12.10177	LR: 0.00030000	
Steps: 12000	Loss: 50478.47656	PPL: 9.06391	bleu: 12.89334	LR: 0.00030000	
Steps: 13000	Loss: 51401.37500	PPL: 9.43665	bleu: 12.72072	LR: 0.00030000	
Steps: 14000

In [27]:
# Test our model
! cd joeynmt; python3 -m joeynmt test configs/transformer_$src$tgt.yaml

2020-02-09 14:43:30,786 Hello! This is Joey-NMT.
2020-02-09 14:43:49,425  dev bleu:  12.78 [Beam search decoding with beam size = 5 and alpha = 1.0]
2020-02-09 14:44:12,850 test bleu:  24.29 [Beam search decoding with beam size = 5 and alpha = 1.0]
