This is an Italian finetune for F5-TTS Italian only so can't speak english properly

Trained over 73+ hours of "train" split of ylacombe/cml-tts dataset with 8xRTX4090, still in progress, using gradio finetuning app using following settings:

exp_name"F5TTS_Base"
learning_rate=0.00001
batch_size_per_gpu=10000
batch_size_type="frame"
max_samples=64
grad_accumulation_steps=1
max_grad_norm=1
epochs=300
num_warmup_updates=2000
save_per_updates=600
last_per_steps=300
finetune=true
file_checkpoint_train=""
tokenizer_type="char"
tokenizer_file=""
mixed_precision="fp16"
logger="wandb"
bnb_optimizer=false

Pre processing

Data extracted from the datasource has been preprocessed in its transcription. From my understanding, punctuation is important because it's used to teach to have pauses and proper intonation so it has been preserved.
Original italian "text" field was even containing direct dialogue escapes (long hyphen) that has also be preserved but it contained also a hypen that was used to split a word in a new line (I don't know which process was used on original dataset to create the text transcription) and so I removed that hypens merging the two part of the word, otherwise the training was done on artifacts that didn't impacted the speech.
I'm only talking about Italian data on cml-tts, I don't know if other languages are affected by this too.

Current most trained model

model_159600.safetensors (~290 Epoch)

known problems

  • catastrophic failure (being Italian only, lost english skill). A proper multilanguage dataset should be used instead of single language.
  • not perfect pronunciation
  • numbers must be converter in letters to be pronunced in italian
  • a better dataset with more diverse voices would help improving zero-shot cloning

checkpoints folder

Contains the weight of the checkpoints at specific steps, the higher the number, the further it went into training. Weights in this folder can be used as starting point to continue training. Ping me back if you can further finetune it to reach a better result

Downloads last month
434
Inference Examples
Inference API (serverless) does not yet support f5-tts models for this pipeline type.

Model tree for alien79/F5-TTS-italian

Base model

SWivid/F5-TTS
Finetuned
(23)
this model

Dataset used to train alien79/F5-TTS-italian