Should this be used for LJSpeech or Style conversion?

#1
by niranjanakella - opened

@Bluebomber182 Hello I am currently testing this model and am not sure where to use it. Either I should load it as part of LJspeech where I insert some noise to generate audio or should I use it as part of Librispeech where I give a voice style sample for style mimicking. Kindly confirm and please kindly provide the config.yml for this.

@niranjanakella
Are you using the Inference_LibriTTS.ipynb file via jupter notebook? If so, use the StyleTTS2-LibriTTS config.yml from this link.
https://huggingface.co/yl4579/StyleTTS2-LibriTTS/tree/main/Models/LibriTTS
Then open the Inference_LibriTTS.ipynb file

jupyter notebook Inference_LibriTTS.ipynb

Add the location of the StyleTTS2-LibriTTS config.yml file

StyleTTS2 01 Screenshot_20240704_022920.png

Add the location of the pth file

StyleTTS 02 Screenshot_20240704_023024.png

Add the location of the reference audio
StyleTTS2 04 Screenshot_20240704_023139.png

@Bluebomber182 Given the bottle neck of the current model of 512 tokens, is there any implementation to handle long formed sentences.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment