PetrosStav
/

F5-TTS-Greek

Model card Files Files and versions Community

F5-TTS-Greek / README.md

PetrosStav's picture

Update README.md

b8808ae verified 3 months ago

|

1.4 kB

	---
	license: cc-by-nc-4.0
	datasets:
	- amphion/Emilia-Dataset
	- mozilla-foundation/common_voice_12_0
	language:
	- el
	- en
	base_model:
	- SWivid/F5-TTS
	pipeline_tag: text-to-speech
	---

	# F5-TTS-Greek

	## F5-TTS model finetuned to speak Greek

	(This work is under development and is in beta version.)

	Finetuned on Greek speech datasets and a small part of Emilia-EN dataset to prevent catastrophic forgetting of English.

	Model can generate Greek text with Greek reference speech, English text with English reference speech, and mix of Greek and English (quality here needs improvement, and many runs might be needed to get good results).

	## Datasets used:

	- Common Voice 12.0 (All Greek Splits) (https://huggingface.co/datasets/mozilla-foundation/common_voice_12_0)
	- Greek Single Speaker Speech Dataset (https://www.kaggle.com/datasets/bryanpark/greek-single-speaker-speech-dataset)
	- Small part of Emilia Dataset (https://huggingface.co/datasets/amphion/Emilia-Dataset) (EN-B000049.tar)

	## Training arguments

	Learning Rate: 0.00001
	Batch Size per GPU: 3200
	Max Samples: 64
	Gradient Accumulation Steps: 1
	Max Gradient Norm: 1
	Epochs: 277
	Warmup Updates: 1274
	Save per Updates: 25000
	Last per Steps: 1000
	mixed_precision: fp16


	## Links:

	Github: https://github.com/SWivid/F5-TTS

	Paper: F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching (https://arxiv.org/abs/2410.06885)