Text-to-Speech
ONNX
English

How to finetune on new language?

#10
by Chan-Y - opened

I want to finetune on Turkish voice

read section on Where is Voice Cloning? in Philosphy

Yes, that. Also, I have not yet posted recipes for continuing training off the checkpoint uploaded to this repo. I might need to update the Philosophy to include something along the lines of:

Currently, Kokoro is packaged & delivered to you as an end product meant to be used & deployed.

That could change later, but no promises.

However, it should be fairly transparent that Kokoro uses a StyleTTS 2 architecture, which is FOSS/MIT, therefore rolling your own model is always an option. Multilingual STTS2 models can be and have been trained. A big hurdle is finding a good g2p solution for your language—I only speak English so this is a very tight bottleneck, that and data sourcing.

For StyleTTS2:

Edit, other multilingual STTS2 models I'm aware of:

  • I believe Respair has also done Persian
  • Someone else (can't find their username on HF right now) has done Korean
  • Another person has done 5-way multilingual: English, German, French, Italian and Spanish

Without gatekeeping, I should warn you that training models (especially STTS) is not for the faint of heart, and could take substantial compute/time/experience—all of which are obtainable—to produce good outcomes. I believe XTTS v2 might support Turkish out-of-the-box, but I have not tried it.

hexgrad changed discussion status to closed

That does mean we leave and forget kokoro if we are about to create an unexistent language and move to styletts ?

Sign up or log in to comment