README.md · espnet/owsm_ctc_v3.1_1B at 3b3dddc51325e9a59ad3a6ce4ab1b62ea872ed29

metadata

tags:
  - espnet
  - audio
  - automatic-speech-recognition
  - speech-translation
  - language-identification
language: multilingual
datasets:
  - owsm_v3.1_ctc
license: cc-by-4.0

OWSM-CTC is an encoder-only speech foundation model based on multi-task self-conditioned CTC. It is trained on 180k hours of public audio data for multilingual speech recognition, any-to-any speech translation, and language identification, which follows the design of the previous encoder-decoder OWSM.