Text-to-Speech
Greek
English
F5-TTS-Greek / README.md
PetrosStav's picture
Update README.md
b38beef verified
metadata
license: cc-by-nc-4.0
datasets:
  - amphion/Emilia-Dataset
  - mozilla-foundation/common_voice_12_0
language:
  - el
  - en
base_model:
  - SWivid/F5-TTS
pipeline_tag: text-to-speech

F5-TTS-Greek

F5-TTS model finetuned to speak Greek

(This work is under development and is in beta version.)

Finetuned on Greek speech datasets and a small part of Emilia-EN dataset to prevent catastrophic forgetting of English.

Model can generate Greek text with Greek reference speech, English text with English reference speech, and mix of Greek and English (quality here needs improvement, and many runs might be needed to get good results).

NOTE: For Greek text, there is an issue with uppercase characters and it will skip them, so only use lowercase characters!

NOTE 2: Because the training data contained short reference audios, the best length should be around 6-9 seconds instead of the 15 in the original model.

Datasets used:

Training

Training was done in a single RTX 3090.

After some manual evaluation, these two checkpoints produced the best results:

How to use

With the dcd9a19 commit of the main github project page, now you can directly use custom models in the infer_gradio page:

image/png

You can either download the models and use the local paths or use the hf paths of this repo directly:

  • hf://PetrosStav/F5-TTS-Greek/model_325000.safetensors
  • hf://PetrosStav/F5-TTS-Greek/vocab.txt

You can use any of the provided reference examples in this repo or use your own.

NOTE: In this version, the model works better with reference audio snippets from the datasets that it was used to train it, though it has kept some of its zero-shot capabilities. So you will be able to use your own voice, but it may require some trial and error.

Training Arguments

  • Learning Rate: 0.00001
  • Batch Size per GPU: 3200
  • Max Samples: 64
  • Gradient Accumulation Steps: 1
  • Max Gradient Norm: 1
  • Epochs: 277
  • Warmup Updates: 1274
  • Save per Updates: 25000
  • Last per Steps: 1000
  • mixed_precision: fp16

Links: