Spaces:
Paused
Paused
title: Amphion Vevo Voice Conversion | |
emoji: 🎤 | |
colorFrom: indigo | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 4.8.0 | |
app_file: app.py | |
pinned: false | |
python_version: "3.10" | |
# Amphion's Vevo - Voice Conversion & TTS | |
This is a Gradio web interface for the Vevo voice conversion model from the Amphion toolkit. It supports: | |
- Voice conversion (transferring both style and timbre) | |
- Timbre-only conversion | |
- Text-to-Speech with voice cloning | |
## Usage | |
1. Select mode: | |
- **Voice**: Convert voice with both style and timbre transfer | |
- **Timbre**: Convert only the timbre of the voice | |
- **TTS**: Generate speech from text with voice cloning | |
2. Upload audio files based on mode: | |
- Source Audio: Your input audio (for voice and timbre modes) | |
- Reference Style: Style reference (for voice and TTS modes) | |
- Reference Timbre: Voice reference (required for all modes) | |
3. For TTS mode: | |
- Enter the text you want to convert to speech | |
- Optionally provide reference text | |
- Select source and reference languages | |
4. Adjust Flow Matching Steps (1-64, default: 32) | |
- Higher values give better quality but take longer | |
- Lower values are faster but may reduce quality | |
5. Click "Generate" to create the converted audio | |
## Sample Files | |
Sample audio files are available in the `Amphion/models/vc/vevo/wav/` directory: | |
- arabic_male.wav | |
- source.wav | |
## Technical Requirements | |
- Python 3.10+ | |
- CUDA-capable GPU recommended for faster inference | |
- Minimum 12GB storage space for models | |
## Models | |
The application automatically downloads required models from Hugging Face: | |
- Content Tokenizer (vq32) | |
- Content-Style Tokenizer (vq8192) | |
- Autoregressive Transformer | |
- Flow Matching Transformer | |
- Vocoder |