Spaces:
Runtime error
Runtime error
[[Back]](..) | |
# Joint Speech Text Training for the 2021 IWSLT multilingual speech translation | |
This directory contains the code from paper ["FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task"](https://arxiv.org/pdf/2107.06959.pdf). | |
## Prepare Data | |
#### Download files | |
- Sentence piece model [spm.model](https://dl.fbaipublicfiles.com/joint_speech_text_4_s2t/iwslt/iwslt_data/spm.model) | |
- Dictionary [tgt_dict.txt](https://dl.fbaipublicfiles.com/joint_speech_text_4_s2t/iwslt/iwslt_data/dict.txt) | |
- Config [config.yaml](https://dl.fbaipublicfiles.com/joint_speech_text_4_s2t/iwslt/iwslt_data/config.yaml) | |
#### Prepare | |
- [Please follow the data preparation in speech-to-text](https://github.com/pytorch/fairseq/blob/main/examples/speech_to_text/docs/mtedx_example.md) | |
## Training | |
#### Download pretrained models | |
- [Pretrained mbart model](https://dl.fbaipublicfiles.com/joint_speech_text_4_s2t/iwslt/iwslt_data/mbart.pt) | |
- [Pretrained w2v model](https://dl.fbaipublicfiles.com/joint_speech_text_4_s2t/iwslt/iwslt_data/xlsr_53_56k.pt) | |
#### Training scripts | |
```bash | |
python train.py ${MANIFEST_ROOT} \ | |
--save-dir ${save_dir} \ | |
--user-dir examples/speech_text_joint_to_text \ | |
--train-subset train_es_en_tedx,train_es_es_tedx,train_fr_en_tedx,train_fr_es_tedx,train_fr_fr_tedx,train_it_it_tedx,train_pt_en_tedx,train_pt_pt_tedx \ | |
--valid-subset valid_es_en_tedx,valid_es_es_tedx,valid_es_fr_tedx,valid_es_it_tedx,valid_es_pt_tedx,valid_fr_en_tedx,valid_fr_es_tedx,valid_fr_fr_tedx,valid_fr_pt_tedx,valid_it_en_tedx,valid_it_es_tedx,valid_it_it_tedx,valid_pt_en_tedx,valid_pt_es_tedx,valid_pt_pt_tedx \ | |
--config-yaml config.yaml --ddp-backend no_c10d \ | |
--num-workers 2 --task speech_text_joint_to_text \ | |
--criterion guided_label_smoothed_cross_entropy_with_accuracy \ | |
--label-smoothing 0.3 --guide-alpha 0.8 \ | |
--disable-text-guide-update-num 5000 --arch dualinputxmtransformer_base \ | |
--max-tokens 500000 --max-sentences 3 --max-tokens-valid 800000 \ | |
--max-source-positions 800000 --enc-grad-mult 2.0 \ | |
--attentive-cost-regularization 0.02 --optimizer adam \ | |
--clip-norm 1.0 --log-format simple --log-interval 200 \ | |
--keep-last-epochs 5 --seed 1 \ | |
--w2v-path ${w2v_path} \ | |
--load-pretrained-mbart-from ${mbart_path} \ | |
--max-update 1000000 --update-freq 4 \ | |
--skip-invalid-size-inputs-valid-test \ | |
--skip-encoder-projection --save-interval 1 \ | |
--attention-dropout 0.3 --mbart-dropout 0.3 \ | |
--finetune-w2v-params all --finetune-mbart-decoder-params all \ | |
--finetune-mbart-encoder-params all --stack-w2v-mbart-encoder \ | |
--drop-w2v-layers 12 --normalize \ | |
--lr 5e-05 --lr-scheduler inverse_sqrt --warmup-updates 5000 | |
``` | |
## Evaluation | |
```bash | |
python ./fairseq_cli/generate.py | |
${MANIFEST_ROOT} \ | |
--task speech_text_joint_to_text \ | |
--user-dir ./examples/speech_text_joint_to_text \ | |
--load-speech-only --gen-subset test_es_en_tedx \ | |
--path ${model} \ | |
--max-source-positions 800000 \ | |
--skip-invalid-size-inputs-valid-test \ | |
--config-yaml config.yaml \ | |
--infer-target-lang en \ | |
--max-tokens 800000 \ | |
--beam 5 \ | |
--results-path ${RESULTS_DIR} \ | |
--scoring sacrebleu | |
``` | |
The trained model can be downloaded [here](https://dl.fbaipublicfiles.com/joint_speech_text_4_s2t/iwslt/iwslt_data/checkpoint17.pt) | |
|direction|es_en|fr_en|pt_en|it_en|fr_es|pt_es|it_es|es_es|fr_fr|pt_pt|it_it| | |
|---|---|---|---|---|---|---|---|---|---|---|---| | |
|BLEU|31.62|36.93|35.07|27.12|38.87|35.57|34.13|74.59|74.64|70.84|69.76| | |