--- license: cc-by-nc-4.0 base_model: - SWivid/F5-TTS --- This is a pruned and re-organized version of [SWivid/F5-TTS](https://huggingface.co/SWivid/F5-TTS), made to be used with the `fairytaler` Python library, an unofficial reimplementation of F5TTS made for fast and lightweight inference. # Installation Fairytaler assumes you have a working CUDA environment to install into. ``` pip install fairytaler ``` This will install [the reimplementation library](https://github.com/painebenjamin/fairytaler/). # How to Use You do not need to pre-download anything, necessary data will be downloaded at runtime. ## Command Line Use the `fairytaler` binary from the command line like so: ```sh fairytaler examples/reference.wav examples/reference.txt "Fairytaler is an unofficial minimal re-implementation of F5 TTS." ``` | Reference Audio | Generated Audio | | --------------- | --------------- | | | | *Reference audio sourced from [DiPCo](https://huggingface.co/datasets/benjamin-paine/dinner-party-corpus)* Many options are available, for complete documentation run `fairytaler --help`. ## Python ```py from fairytaler import F5TTSPipeline pipeline = F5TTSPipeline.from_pretrained("benjamin-paine/fairytaler", device="auto") output_wav_file = pipeline( text="Hello, this is some test audio!", reference_audio="examples/reference.wav", reference_text="examples/reference.txt", output_save=True ) print(f"Output saved to {output_wav_file}") ``` The full execution signature is: ```py def __call__( self, text: Union[str, List[str]], reference_audio: AudioType, reference_text: str, reference_sample_rate: Optional[int]=None, seed: SeedType=None, speed: float=1.0, sway_sampling_coef: float=-1.0, target_rms: float=0.1, cross_fade_duration: float=0.15, punctuation_pause_duration: float=0.10, num_steps: int=32, cfg_strength: float=2.0, fix_duration: Optional[float]=None, use_tqdm: bool=False, output_format: AUDIO_OUTPUT_FORMAT_LITERAL="wav", output_save: bool=False, chunk_callback: Optional[Callable[[AudioResultType], None]]=None, chunk_callback_format: AUDIO_OUTPUT_FORMAT_LITERAL="float", ) -> AudioResultType ``` Format values are `wav`, `ogg`, `flac`, `mp3`, `float` and `int`. Passing `output_save=True` will save to file, not passing it will return the data directly. # Citations ``` @misc{chen2024f5ttsfairytalerfakesfluent, title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen}, year={2024}, eprint={2410.06885}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2410.06885}, } @misc{vansegbroeck2019dipcodinnerparty, title={DiPCo -- Dinner Party Corpus}, author={Maarten Van Segbroeck and Ahmed Zaid and Ksenia Kutsenko and Cirenia Huerta and Tinh Nguyen and Xuewen Luo and Björn Hoffmeister and Jan Trmal and Maurizio Omologo and Roland Maas}, year={2019}, eprint={1909.13447}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/1909.13447}, } ```