Finetune for single speaker
Hi @rumourscape ,
I want to fine-tune this for a single speaker. Which tokenizer should I use for fine-tuning?
You don't need to modify or change the tokenizer if you want to finetune for only hindi.
Hi @rumourscape ,
Firstly, thank you so much for your reply. I'm trying to fine-tune SPRINGLab/F5-Hindi-24KHz
model further for a single speaker using the F5 TTS training script, but after fine-tuning, the model only generates noise.
I suspect the issue might be due to using the compressed .safetensors
model. I would be really grateful if you could share your thoughts on this and guide me on how I can properly fine-tune it using a single-speaker Hindi dataset.
Thank you so much in advance!
I hope you are not using the convert_char_to_pinyin
function. I did not use it when training since it added a space between each character.
Also, do you know why my model has been getting so many downloads in the past couple of days? Was it publicized by someone?
Hi @rumourscape ,
Sorry, I don't have any idea about that, but this model sounds really good, so maybe that's why it's attracting so much attention.
I tried removing the convert_char_to_pinyin
function, but it still generates noise. I would kindly request you to create a short guide on how to fine-tune it whenever you have time—it would be a great help.
Thanks!
Hi @rumourscape ,
I tried fine-tuning the F5 TTS French model, and it works perfectly. I believe the issue might be that the checkpoint you shared is a reduced version, which doesn't support further training. Could you please share the original, non-reduced model file?
Thank you!
Alright, I have uploaded the original .pt file. This file also includes the gradients and optimizer states of the training run. It might help to remove them for finetuning.
Hey
I am getting noisy output using the hindi model?
what might i be doing wrong?
The command i am using
"""python /workspace/F5-TTS/src/f5_tts/infer/infer_cli.py --model "F5-TTS-small" --ckpt_file "/workspace/F5-Hindi-24KHz/model_2500000.safetensors" --vocab_file "/workspace/F5-Hindi-24KHz/vocab.txt" --ref_audio "/workspace/F5-Hindi-24KHz/samples/dear_friends_cleaned_1001.wav" --ref_text "अपने पीछे खड़े एक आदमी को इशारा किया तो वो आदमी खींचते हुए नवीन जोशी को वहां से बाहर ले गया । तब तक थप्पड़ की आवाज सुनकर कालिंदी और" --gen_text "अपने पीछे खड़े एक आदमी को इशारा किया तो वो आदमी खींचते हुए नवीन जोशी को वहां से बाहर ले गया । तब तक थप्पड़ की आवाज सुनकर कालिंदी और" """
generated output: