dialogs2-factory / README.md
naonauno's picture
Update README.md
1b37547 verified
---
title: Amphion Vevo Voice Conversion
emoji: 🎤
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.8.0
app_file: app.py
pinned: false
python_version: "3.10"
---
# Amphion's Vevo - Voice Conversion & TTS
This is a Gradio web interface for the Vevo voice conversion model from the Amphion toolkit. It supports:
- Voice conversion (transferring both style and timbre)
- Timbre-only conversion
- Text-to-Speech with voice cloning
## Usage
1. Select mode:
- **Voice**: Convert voice with both style and timbre transfer
- **Timbre**: Convert only the timbre of the voice
- **TTS**: Generate speech from text with voice cloning
2. Upload audio files based on mode:
- Source Audio: Your input audio (for voice and timbre modes)
- Reference Style: Style reference (for voice and TTS modes)
- Reference Timbre: Voice reference (required for all modes)
3. For TTS mode:
- Enter the text you want to convert to speech
- Optionally provide reference text
- Select source and reference languages
4. Adjust Flow Matching Steps (1-64, default: 32)
- Higher values give better quality but take longer
- Lower values are faster but may reduce quality
5. Click "Generate" to create the converted audio
## Sample Files
Sample audio files are available in the `Amphion/models/vc/vevo/wav/` directory:
- arabic_male.wav
- source.wav
## Technical Requirements
- Python 3.10+
- CUDA-capable GPU recommended for faster inference
- Minimum 12GB storage space for models
## Models
The application automatically downloads required models from Hugging Face:
- Content Tokenizer (vq32)
- Content-Style Tokenizer (vq8192)
- Autoregressive Transformer
- Flow Matching Transformer
- Vocoder