File size: 887 Bytes
9cd752f 9bbe0c8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
---
language: xal
tags:
- speech
- audio
- automatic-speech-recognition
license: apache-2.0
---
## Info
This Wav2Vec2 model was first pretrained on 500 hours Kalmyk TV recordings and 1000 hours Mongolian speech recognition dataset. After that, the model was finetuned on a 300 hours [Kalmyk synthetic STT dataset](https://github.com/tugstugi/mongolian-nlp#datasets) created by a voice conversion model.
* 50% WER on a private test set created from Kalmyk TV recordnings
* on clean voice recordings, the model should have much lower WER
* voice conversion info
* 300 hours [Kalmyk synthetic STT dataset](https://github.com/tugstugi/mongolian-nlp#datasets)
* The source voice is a Kalmyk female voice TTS
* Target voices are from the VCTK dataset
* example data: https://twitter.com/tugstugi/status/1409111296897912835
* each WAV has a different text created from Kalmyk books
|