README.md · espnet/owsm_ctc_v3.1_1B at 9e55bb0b8fb13f2e4b764d099ae75a581d3f516f

metadata

tags:
  - espnet
  - audio
  - automatic-speech-recognition
  - speech-translation
  - language-identification
language: multilingual
datasets:
  - owsm_v3.1_ctc
license: cc-by-4.0

OWSM-CTC is an encoder-only speech foundation model based on multi-task self-conditioned CTC. It is trained on 180k hours of public audio data for multilingual speech recognition, any-to-any speech translation, and language identification, which follows the design of the previous encoder-decoder OWSM.

Due to time constraint, the model used in the paper was trained for 40 "epochs". The new model trained for 45 "epochs" is also added in this repo in order to match the setup of encoder-decoder OWSM. It has better performance than the old one in general.