omogr/xtts-ru-ipa · How many data

Sep 23

Hi, i see you replace old vocab with new russian IPA vocab. How many data to training this model, thank you

Owner Sep 25

Hello. The library contains one statistical model (for generating IPA transcription) and two bert models for accentuation. To train the statistical model, I used mainly words from wiktionary and wikipedia, (the Russian version of which contains an IPA transcription). To train bert, I used ~3 GB of text data, for which the correct accents were placed for ambiguous words. I am currently working on increasing the amount of training data in order to more accurately resolve ambiguities in the accentuation.

anhnct

Sep 25

Thank you for your reply. What I mean is how much audio data do you use for training xtts

omogr

Owner Sep 25

•

edited Sep 25

I'm sorry, I should have guessed. It was a small experiment, just to understand how it makes sense to use transcription and accents for speech synthesis. I used ~60 hours of speech for training. In the README, I referred to the acoustic data that I used for training. https://github.com/omogr/omogre/blob/main/README_eng.md. The model was trained on the RUSLAN and Common Voice datasets.

https://ruslan-corpus.github.io/
https://commonvoice.mozilla.org/ru

anhnct

Sep 25

Thank you