|
--- |
|
license: cc-by-nc-4.0 |
|
datasets: |
|
- amphion/Emilia-Dataset |
|
- mozilla-foundation/common_voice_12_0 |
|
language: |
|
- el |
|
- en |
|
base_model: |
|
- SWivid/F5-TTS |
|
pipeline_tag: text-to-speech |
|
--- |
|
|
|
F5-TTS model finetuned to speak Greek. |
|
|
|
(This work is under development and is in beta version.) |
|
|
|
Finetuned on Greek speech datasets and a small part of Emilia-EN dataset to prevent catastrophic forgetting of English. |
|
|
|
Model can generate Greek text with Greek reference audio, English text with English reference, and mix of Greek and English (quality here needs improvement, and many runs might be needed). |
|
|
|
Dataset consists of: |
|
- Common Voice 12.0 (All Greek Splits) (https://huggingface.co/datasets/mozilla-foundation/common_voice_12_0) |
|
- Greek Single Speaker Speech (https://www.kaggle.com/datasets/bryanpark/greek-single-speaker-speech-dataset) |
|
- Small part of Emilia Dataset (https://huggingface.co/datasets/amphion/Emilia-Dataset) (EN-B000049.tar) |
|
|
|
Github: https://github.com/SWivid/F5-TTS |
|
Paper: F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching |
|
|