Should this be used for LJSpeech or Style conversion?

#1
by niranjanakella - opened

@Bluebomber182 Hello I am currently testing this model and am not sure where to use it. Either I should load it as part of LJspeech where I insert some noise to generate audio or should I use it as part of Librispeech where I give a voice style sample for style mimicking. Kindly confirm and please kindly provide the config.yml for this.

@niranjanakella
Are you using the Inference_LibriTTS.ipynb file via jupter notebook? If so, use the StyleTTS2-LibriTTS config.yml from this link.
https://huggingface.co/yl4579/StyleTTS2-LibriTTS/tree/main/Models/LibriTTS
Then open the Inference_LibriTTS.ipynb file

jupyter notebook Inference_LibriTTS.ipynb

Add the location of the StyleTTS2-LibriTTS config.yml file

StyleTTS2 01 Screenshot_20240704_022920.png

Add the location of the pth file

StyleTTS 02 Screenshot_20240704_023024.png

Add the location of the reference audio
StyleTTS2 04 Screenshot_20240704_023139.png

@Bluebomber182 Given the bottle neck of the current model of 512 tokens, is there any implementation to handle long formed sentences.

Sign up or log in to comment