Papers
arxiv:2010.05171

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

Published on Oct 11, 2020
Authors:
,
,
,
,
,
,

Abstract

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based, Transformer-based as well as Conformer-based models and open-source detailed training recipes. Fairseq's machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T documentation and examples are available at https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text.

Community

Sign up or log in to comment

Models citing this paper 42

Browse 42 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2010.05171 in a dataset README.md to link it from this page.

Spaces citing this paper 29

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.