Baseline Simultaneous Translation

This is an instruction of training and evaluating a wait-k simultanoes LSTM model on MUST-C English-Gernam Dataset.

STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework

Requirements

Install fairseq (make sure to use the correct branch):

git clone --branch simulastsharedtask git@github.com:pytorch/fairseq.git
cd fairseq
pip install -e .

Assuming that fairseq is installed in a directory called FAIRSEQ.

Install SentencePiece. One easy way is to use anaconda:

conda install -c powerai sentencepiece

Download the MuST-C data for English-German available at https://ict.fbk.eu/must-c/. We will assume that the data is downloaded in a directory called DATA_ROOT.

Text-to-text Model

Data Preparation

Train a SentencePiece model:

for lang in en de; do
    python $FAIRSEQ/examples/simultaneous_translation/data/train_spm.py \
        --data-path $DATA_ROOT/data \
        --vocab-size 10000 \
        --max-frame 3000 \
        --model-type unigram \
        --lang $lang \
        --out-path .

Process the data with the SentencePiece model:

proc_dir=proc
mkdir -p $proc_dir
for split in train dev tst-COMMON tst-HE; do
    for lang in en de; do
        spm_encode \
            --model unigram-$lang-10000-3000/spm.model \
            < $DATA_ROOT/data/$split/txt/$split.$lang \
            > $proc_dir/$split.spm.$lang
    done
done

Binarize the data:

proc_dir=proc
fairseq-preprocess \
    --source-lang en --target-lang de \
    --trainpref $proc_dir/train.spm \
    --validpref $proc_dir/dev.spm \
    --testpref $proc_dir/tst-COMMON.spm \
    --thresholdtgt 0 \
    --thresholdsrc 0 \
    --workers 20 \
    --destdir ./data-bin/mustc_en_de \

Training

mkdir -p checkpoints
CUDA_VISIBLE_DEVICES=1 python $FAIRSEQ/train.py data-bin/mustc_en_de \
    --save-dir checkpoints \
    --arch berard_simul_text_iwslt \
    --simul-type waitk \
    --waitk-lagging 2 \
    --optimizer adam \
    --max-epoch 100 \
    --lr 0.001 \
    --clip-norm 5.0  \
    --batch-size 128  \
    --log-format json \
    --log-interval 10 \
    --criterion cross_entropy_acc \
    --user-dir $FAIRSEQ/examples/simultaneous_translation

Speech-to-text Model

Data Preparation

First, segment wav files.

python $FAIRSEQ/examples/simultaneous_translation/data/segment_wav.py \
    --datapath $DATA_ROOT

Similar to text-to-text model, train a Sentencepiecemodel, but only train on German

python $FAIRSEQ/examples/simultaneous_translation/data/train_spm.py \
    --data-path $DATA_ROOT/data \
    --vocab-size 10000 \
    --max-frame 3000 \
    --model-type unigram \
    --lang $lang \
    --out-path .

Training

mkdir -p checkpoints
CUDA_VISIBLE_DEVICES=1 python $FAIRSEQ/train.py data-bin/mustc_en_de \
    --save-dir checkpoints \
    --arch berard_simul_text_iwslt \
    --waitk-lagging 2 \
    --waitk-stride 10 \
    --input-feat-per-channel 40 \
    --encoder-hidden-size 512 \
    --output-layer-dim 128 \
    --decoder-num-layers 3 \
    --task speech_translation \
    --user-dir $FAIRSEQ/examples/simultaneous_translation
    --optimizer adam \
    --max-epoch 100 \
    --lr 0.001 \
    --clip-norm 5.0  \
    --batch-size 128  \
    --log-format json \
    --log-interval 10 \
    --criterion cross_entropy_acc \
    --user-dir $FAIRSEQ/examples/simultaneous_translation

Evaluation

Evaluation Server

For text translation models, the server is set up as follow give input file and reference file.

python ./eval/server.py \
    --hostname localhost \
    --port 12321 \
    --src-file $DATA_ROOT/data/dev/txt/dev.en \
    --ref-file $DATA_ROOT/data/dev/txt/dev.de

For speech translation models, the input is the data direcrory.

python ./eval/server.py \
    --hostname localhost \
    --port 12321 \
    --ref-file $DATA_ROOT \
    --data-type speech

Decode and Evaluate with Client

Once the server is set up, run client to evaluate translation quality and latency.

# TEXT
python $fairseq_dir/examples/simultaneous_translation/evaluate.py \
    data-bin/mustc_en_de \
    --user-dir $FAIRSEQ/examples/simultaneous_translation \
    --src-spm unigram-en-10000-3000/spm.model\
    --tgt-spm unigram-de-10000-3000/spm.model\
    -s en -t de \
    --path checkpoints/checkpoint_best.pt

# SPEECH
python $fairseq_dir/examples/simultaneous_translation/evaluate.py \
    data-bin/mustc_en_de \
    --user-dir $FAIRSEQ/examples/simultaneous_translation \
    --data-type speech \
    --tgt-spm unigram-de-10000-3000/spm.model\
    -s en -t de \
    --path checkpoints/checkpoint_best.pt