GitHub

F5-TTS Spanish Language Model

Overview

The F5-TTS model is finetuned specifically for Spanish language speech synthesis. This project aims to deliver high-quality, regionally diverse speech synthesis capabilities for Spanish speakers.

License

This model is released under the CC0-1.0 license, which allows for free usage, modification, and distribution.

Datasets

The following datasets were used for training:

Voxpopuli Dataset, with mainly Peninsular Spain accents
Crowdsourced high-quality Spanish speech data:
- Argentinian Spanish
- Chilean Spanish
- Colombian Spanish
- Peruvian Spanish
- Puerto Rican Spanish
- Venezuelan Spanish
TEDx Spanish Corpus

Additional sources:

Model Information

Base Model: SWivid/F5-TTS
Total Training Duration: 218 hours of audio
Training Configuration:

Batch Size: 3200
Max Samples: 64
Training Steps: 1,200,000

Usage Instructions

Method 0: HuggingFace space (https://huggingface.co/spaces/jpgallegoar/Spanish-F5)

Method 1: Manual Model Replacement

Run the F5-TTS Application: Start the F5-TTS application and observe the terminal for output indicating the model file path. It should appear similar to:

model : C:\Users\thega\.cache\huggingface\hub\models--SWivid--F5-TTS\snapshots\995ff41929c08ff968786b448a384330438b5cb6\F5TTS_Base\model_1200000.safetensors

Replace the Model File:
- Navigate to the displayed file location.
- Rename the existing model file to model_1200000.safetensors.bak.
- Download model_1200000.safetensors from this repository and save it to the same location.
Restart the Application: Relaunch the F5-TTS application to load the updated model.

Alternative Methods

GitHub Repository: Clone the Spanish-F5 repository and follow the provided installation instructions.
Google Colab: Use the model via Google Colab.
- Runtime -> Change Runtime Type -> T4 GPU
- Runtime -> Run all
- Click on the link shown in "Running on public URL: https://link.gradio.live" when it loads
Jupyter Notebook: Run the model through the Spanish_F5.ipynb notebook.

Contributions and Recommendations

This model may benefit from further fine-tuning to enhance its performance across different Spanish dialects. Contributions from the community are encouraged. For optimal output quality, preprocess the reference audio by removing background noise, balancing audio levels, and enhancing clarity.

jpgallegoar
/

F5-Spanish