metadata
tags:
- espnet
- audio
- automatic-speech-recognition
language: et
license: cc-by-4.0
Estonian Espnet2 ASR model
Model description
This is a general-purpose Estonian ASR model trained in the Lab of Language Technology at TalTech.
Intended uses & limitations
This model is intended for general-purpose speech recognition, such as broadcast conversations, interviews, talks, etc.
How to use
from espnet2.bin.asr_inference import Speech2Text
model = Speech2Text.from_pretrained(
"TalTechNLP/espnet2_estonian"
)
speech, rate = soundfile.read("speech.wav")
text, *_ = model(speech)
Limitations and bias
Training data
Training procedure
Evaluation results
BibTeX entry and citation info
Citing ESPnet
@inproceedings{watanabe2018espnet,
author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
title={{ESPnet}: End-to-End Speech Processing Toolkit},
year={2018},
booktitle={Proceedings of Interspeech},
pages={2207--2211},
doi={10.21437/Interspeech.2018-1456},
url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}
@inproceedings{hayashi2020espnet,
title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={7654--7658},
year={2020},
organization={IEEE}
}