Text-to-Speech
F5-TTS
Hindi

Finetune for single speaker

#5
by Shanos76 - opened

Hi @rumourscape ,

I want to fine-tune this for a single speaker. Which tokenizer should I use for fine-tuning?

SPRINGLab org

You don't need to modify or change the tokenizer if you want to finetune for only hindi.

Hi @rumourscape ,

Firstly, thank you so much for your reply. I'm trying to fine-tune SPRINGLab/F5-Hindi-24KHz model further for a single speaker using the F5 TTS training script, but after fine-tuning, the model only generates noise.

I suspect the issue might be due to using the compressed .safetensors model. I would be really grateful if you could share your thoughts on this and guide me on how I can properly fine-tune it using a single-speaker Hindi dataset.

Thank you so much in advance!

SPRINGLab org

I hope you are not using the convert_char_to_pinyin function. I did not use it when training since it added a space between each character.

Also, do you know why my model has been getting so many downloads in the past couple of days? Was it publicized by someone?

Hi @rumourscape ,

Sorry, I don't have any idea about that, but this model sounds really good, so maybe that's why it's attracting so much attention.

I tried removing the convert_char_to_pinyin function, but it still generates noise. I would kindly request you to create a short guide on how to fine-tune it whenever you have time—it would be a great help.

Thanks!

Hi @rumourscape ,

I tried fine-tuning the F5 TTS French model, and it works perfectly. I believe the issue might be that the checkpoint you shared is a reduced version, which doesn't support further training. Could you please share the original, non-reduced model file?

Thank you!

hamees changed discussion status to closed
SPRINGLab org

Alright, I have uploaded the original .pt file. This file also includes the gradients and optimizer states of the training run. It might help to remove them for finetuning.

rumourscape changed discussion status to open

Hey
I am getting noisy output using the hindi model?
what might i be doing wrong?
The command i am using
"""python /workspace/F5-TTS/src/f5_tts/infer/infer_cli.py --model "F5-TTS-small" --ckpt_file "/workspace/F5-Hindi-24KHz/model_2500000.safetensors" --vocab_file "/workspace/F5-Hindi-24KHz/vocab.txt" --ref_audio "/workspace/F5-Hindi-24KHz/samples/dear_friends_cleaned_1001.wav" --ref_text "अपने पीछे खड़े एक आदमी को इशारा किया तो वो आदमी खींचते हुए नवीन जोशी को वहां से बाहर ले गया । तब तक थप्पड़ की आवाज सुनकर कालिंदी और" --gen_text "अपने पीछे खड़े एक आदमी को इशारा किया तो वो आदमी खींचते हुए नवीन जोशी को वहां से बाहर ले गया । तब तक थप्पड़ की आवाज सुनकर कालिंदी और" """

generated output:

rumourscape changed discussion status to closed

Sign up or log in to comment