amupd's picture
SpeechT5 upload
62e9ca6

A newer version of the Gradio SDK is available: 4.37.2

Upgrade

Speech2S

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

  • (Updating) Nov. 2022: release the code and models
  • Nov. 2022: release preprint in arXiv

Pre-Trained and Fine-tuned Models

Model Pre-training Dataset Fine-tuning Dataset Model
Speech2S_enes Voxpopuli_en_v2 - Google Drive
Speech2S_enes Voxpopuli_en_v2 Voxpopuli_s2s Google Drive
Speech2S_esen Voxpopuli_es_v2 - Google Drive
Speech2S_esen Voxpopuli_es_v2 Voxpopuli_s2s Google Drive

Setup

cd Speech2S/speech2s
pip install --editable fairseq/

Data Preparation

Please follow the steps of data preparation for S2ST in here.

Pre-Training

cd speech2s/stpretrain_scripts
base_sc2c_enes.sh

Finetune

cd speech2s/stpretrain_scripts
finetune_enes.sh

Inference

cd speech2s/stpretrain_scripts
inference_ed.sh

Results on Voxpopuli and Covst

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ.

Microsoft Open Source Code of Conduct

Reference

If you find our work is useful in your research, please cite the following paper:

@article{wei2022joint,
  title={Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation},
  author={Wei, Kun and Zhou, Long and Zhang, Ziqiang and Chen, Liping and Liu, Shujie and He, Lei and Li, Jinyu and Wei, Furu},
  journal={arXiv preprint arXiv:2210.17027},
  year={2022}
}