alien79
/

F5-TTS-italian

Model card Files Files and versions Community

F5-TTS-italian / README.md

alien79's picture

Upload 2 files

81f410a verified about 2 months ago

|

1.79 kB

	---
	datasets:
	- ylacombe/cml-tts
	language:
	- it
	base_model:
	- SWivid/F5-TTS
	pipeline_tag: text-to-speech
	license: cc-by-4.0
	library_name: f5-tts
	---

	This is an Italian finetune for F5-TTS
	Italian only so can't speak english properly

	Trained over 73+ hours of "train" split of ylacombe/cml-tts dataset
	with 8xRTX4090, still in progress, using gradio finetuning app using following settings:
	```
	exp_name"F5TTS_Base"
	learning_rate0.00001
	batch_size_per_gpu10000
	batch_size_type"frame"
	max_samples64
	grad_accumulation_steps1
	max_grad_norm1
	epochs100
	num_warmup_updates2000
	save_per_updates600
	last_per_steps300
	finetunetrue
	file_checkpoint_train""
	tokenizer_type"char"
	tokenizer_file""
	mixed_precision"fp16"
	logger"wandb"
	bnb_optimizerfalse
	```

	# Pre processing
	Data extracted from the datasource has been preprocessed in its transcription.
	From my understanding, punctuation is important because it's used to teach to have pauses and proper intonation so it has been preserved.
	Original italian "text" field was even containing direct dialogue escapes (long hyphen) that has also be preserved but it contained also
	a hypen that was used to split a word in a new line (I don't know which process was used on original dataset to create the text transcription)
	and so I removed that hypens merging the two part of the word, otherwise the training was done on artifacts that didn't impacted the speech.
	I'm only talking about Italian data on cml-tts, I don't know if other languages are affected by this too.


	# Current most trained model
	model_25200.safetensors (45 Epoch)


	### checkpoints folder
	Contains the weight of the checkpoints at specific steps, the higher the number, the further it went into training.
	Weights in this folder can be used as starting point to continue training.