Automatic Speech Recognition
ESPnet
multilingual
audio
speech-translation
language-identification
owsm_ctc_v3.1_1B / README.md
pyf98's picture
Update README.md
9e55bb0 verified
|
raw
history blame
808 Bytes
metadata
tags:
  - espnet
  - audio
  - automatic-speech-recognition
  - speech-translation
  - language-identification
language: multilingual
datasets:
  - owsm_v3.1_ctc
license: cc-by-4.0

OWSM-CTC is an encoder-only speech foundation model based on multi-task self-conditioned CTC. It is trained on 180k hours of public audio data for multilingual speech recognition, any-to-any speech translation, and language identification, which follows the design of the previous encoder-decoder OWSM.

Due to time constraint, the model used in the paper was trained for 40 "epochs". The new model trained for 45 "epochs" is also added in this repo in order to match the setup of encoder-decoder OWSM. It has better performance than the old one in general.