|
--- |
|
license: cc-by-nc-4.0 |
|
base_model: |
|
- SWivid/F5-TTS |
|
--- |
|
|
|
This is a pruned and re-organized version of [SWivid/F5-TTS](https://huggingface.co/SWivid/F5-TTS), made to be used with the `fairytaler` Python library, an unofficial reimplementation of F5TTS made for fast and lightweight inference. |
|
|
|
# Installation |
|
|
|
Fairytaler assumes you have a working CUDA environment to install into. |
|
|
|
``` |
|
pip install fairytaler |
|
``` |
|
|
|
This will install [the reimplementation library](https://github.com/painebenjamin/fairytaler/). |
|
|
|
# How to Use |
|
|
|
You do not need to pre-download anything, necessary data will be downloaded at runtime. |
|
|
|
## Command Line |
|
|
|
Use the `fairytaler` binary from the command line like so: |
|
|
|
```sh |
|
fairytaler examples/reference.wav examples/reference.txt "Fairytaler is an unofficial minimal re-implementation of F5 TTS." |
|
``` |
|
|
|
| Reference Audio | Generated Audio | |
|
| --------------- | --------------- | |
|
| <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/SBSzkafZSdjIQERVpDcqf.wav"></audio> | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/5VGepj6y7wb4qd0-p-IQq.wav"></audio> | |
|
|
|
*Reference audio sourced from [DiPCo](https://huggingface.co/datasets/benjamin-paine/dinner-party-corpus)* |
|
|
|
Many options are available, for complete documentation run `fairytaler --help`. |
|
|
|
## Python |
|
|
|
```py |
|
from fairytaler import F5TTSPipeline |
|
|
|
pipeline = F5TTSPipeline.from_pretrained("benjamin-paine/fairytaler", device="auto") |
|
output_wav_file = pipeline( |
|
text="Hello, this is some test audio!", |
|
reference_audio="examples/reference.wav", |
|
reference_text="examples/reference.txt", |
|
output_save=True |
|
) |
|
print(f"Output saved to {output_wav_file}") |
|
``` |
|
|
|
The full execution signature is: |
|
|
|
```py |
|
def __call__( |
|
self, |
|
text: Union[str, List[str]], |
|
reference_audio: AudioType, |
|
reference_text: str, |
|
reference_sample_rate: Optional[int]=None, |
|
seed: SeedType=None, |
|
speed: float=1.0, |
|
sway_sampling_coef: float=-1.0, |
|
target_rms: float=0.1, |
|
cross_fade_duration: float=0.15, |
|
punctuation_pause_duration: float=0.10, |
|
num_steps: int=32, |
|
cfg_strength: float=2.0, |
|
fix_duration: Optional[float]=None, |
|
use_tqdm: bool=False, |
|
output_format: AUDIO_OUTPUT_FORMAT_LITERAL="wav", |
|
output_save: bool=False, |
|
chunk_callback: Optional[Callable[[AudioResultType], None]]=None, |
|
chunk_callback_format: AUDIO_OUTPUT_FORMAT_LITERAL="float", |
|
) -> AudioResultType |
|
``` |
|
|
|
Format values are `wav`, `ogg`, `flac`, `mp3`, `float` and `int`. Passing `output_save=True` will save to file, not passing it will return the data directly. |
|
|
|
# Citations |
|
|
|
``` |
|
@misc{chen2024f5ttsfairytalerfakesfluent, |
|
title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, |
|
author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen}, |
|
year={2024}, |
|
eprint={2410.06885}, |
|
archivePrefix={arXiv}, |
|
primaryClass={eess.AS}, |
|
url={https://arxiv.org/abs/2410.06885}, |
|
} |
|
|
|
@misc{vansegbroeck2019dipcodinnerparty, |
|
title={DiPCo -- Dinner Party Corpus}, |
|
author={Maarten Van Segbroeck and Ahmed Zaid and Ksenia Kutsenko and Cirenia Huerta and Tinh Nguyen and Xuewen Luo and Björn Hoffmeister and Jan Trmal and Maurizio Omologo and Roland Maas}, |
|
year={2019}, |
|
eprint={1909.13447}, |
|
archivePrefix={arXiv}, |
|
primaryClass={eess.AS}, |
|
url={https://arxiv.org/abs/1909.13447}, |
|
} |
|
``` |