File size: 1,720 Bytes
304c2bb
1b37547
304c2bb
 
 
 
1b37547
304c2bb
 
1b37547
304c2bb
 
d66c48f
 
 
 
 
 
 
 
 
 
1b37547
 
 
 
 
 
 
 
 
 
d66c48f
 
1b37547
 
 
 
 
 
 
d66c48f
 
1b37547
 
 
 
 
 
 
 
 
 
 
 
d66c48f
 
1b37547
d66c48f
 
 
 
1b37547
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
title: Amphion Vevo Voice Conversion
emoji: 🎤
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.8.0
app_file: app.py
pinned: false
python_version: "3.10"
---

# Amphion's Vevo - Voice Conversion & TTS

This is a Gradio web interface for the Vevo voice conversion model from the Amphion toolkit. It supports:

- Voice conversion (transferring both style and timbre)
- Timbre-only conversion
- Text-to-Speech with voice cloning

## Usage

1. Select mode:
   - **Voice**: Convert voice with both style and timbre transfer
   - **Timbre**: Convert only the timbre of the voice
   - **TTS**: Generate speech from text with voice cloning

2. Upload audio files based on mode:
   - Source Audio: Your input audio (for voice and timbre modes)
   - Reference Style: Style reference (for voice and TTS modes)
   - Reference Timbre: Voice reference (required for all modes)

3. For TTS mode:
   - Enter the text you want to convert to speech
   - Optionally provide reference text
   - Select source and reference languages

4. Adjust Flow Matching Steps (1-64, default: 32)
   - Higher values give better quality but take longer
   - Lower values are faster but may reduce quality

5. Click "Generate" to create the converted audio

## Sample Files

Sample audio files are available in the `Amphion/models/vc/vevo/wav/` directory:
- arabic_male.wav
- source.wav

## Technical Requirements

- Python 3.10+
- CUDA-capable GPU recommended for faster inference
- Minimum 12GB storage space for models

## Models

The application automatically downloads required models from Hugging Face:
- Content Tokenizer (vq32)
- Content-Style Tokenizer (vq8192)
- Autoregressive Transformer
- Flow Matching Transformer
- Vocoder