Finetune for single speaker

by Shanos76 - opened Dec 29, 2024

Discussion

Shanos76

Dec 29, 2024

Hi @rumourscape ,

I want to fine-tune this for a single speaker. Which tokenizer should I use for fine-tuning?

rumourscape

SPRINGLab org Dec 30, 2024

You don't need to modify or change the tokenizer if you want to finetune for only hindi.

Shanos76

Dec 30, 2024

Hi @rumourscape ,

Firstly, thank you so much for your reply. I'm trying to fine-tune SPRINGLab/F5-Hindi-24KHz model further for a single speaker using the F5 TTS training script, but after fine-tuning, the model only generates noise.

I suspect the issue might be due to using the compressed .safetensors model. I would be really grateful if you could share your thoughts on this and guide me on how I can properly fine-tune it using a single-speaker Hindi dataset.

Thank you so much in advance!

rumourscape

SPRINGLab org Dec 31, 2024

I hope you are not using the convert_char_to_pinyin function. I did not use it when training since it added a space between each character.

Also, do you know why my model has been getting so many downloads in the past couple of days? Was it publicized by someone?

Shanos76

Dec 31, 2024

Hi @rumourscape ,

Sorry, I don't have any idea about that, but this model sounds really good, so maybe that's why it's attracting so much attention.

I tried removing the convert_char_to_pinyin function, but it still generates noise. I would kindly request you to create a short guide on how to fine-tune it whenever you have time—it would be a great help.

Thanks!

Shanos76

Jan 1

Hi @rumourscape ,

I tried fine-tuning the F5 TTS French model, and it works perfectly. I believe the issue might be that the checkpoint you shared is a reduced version, which doesn't support further training. Could you please share the original, non-reduced model file?

Thank you!

hamees changed discussion status to closed Jan 1

rumourscape

SPRINGLab org Jan 1

Alright, I have uploaded the original .pt file. This file also includes the gradients and optimizer states of the training run. It might help to remove them for finetuning.

rumourscape changed discussion status to open Jan 1

AbhishekTiwariAKT

Jan 20

Hey
I am getting noisy output using the hindi model?
what might i be doing wrong?
The command i am using
"""python /workspace/F5-TTS/src/f5_tts/infer/infer_cli.py --model "F5-TTS-small" --ckpt_file "/workspace/F5-Hindi-24KHz/model_2500000.safetensors" --vocab_file "/workspace/F5-Hindi-24KHz/vocab.txt" --ref_audio "/workspace/F5-Hindi-24KHz/samples/dear_friends_cleaned_1001.wav" --ref_text "अपने पीछे खड़े एक आदमी को इशारा किया तो वो आदमी खींचते हुए नवीन जोशी को वहां से बाहर ले गया । तब तक थप्पड़ की आवाज सुनकर कालिंदी और" --gen_text "अपने पीछे खड़े एक आदमी को इशारा किया तो वो आदमी खींचते हुए नवीन जोशी को वहां से बाहर ले गया । तब तक थप्पड़ की आवाज सुनकर कालिंदी और" """

generated output:

rumourscape changed discussion status to closed Jan 20

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment