Spaces:

naonauno
/

dialogs2-factory

Paused

App Files Files Community

dialogs2-factory / README.md

naonauno

Update README.md

1b37547 verified 3 months ago

preview code

raw

history blame contribute delete

1.72 kB

	---
	title: Amphion Vevo Voice Conversion
	emoji: 🎤
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: 4.8.0
	app_file: app.py
	pinned: false
	python_version: "3.10"
	---

	# Amphion's Vevo - Voice Conversion & TTS

	This is a Gradio web interface for the Vevo voice conversion model from the Amphion toolkit. It supports:

	- Voice conversion (transferring both style and timbre)
	- Timbre-only conversion
	- Text-to-Speech with voice cloning

	## Usage

	1. Select mode:
	- Voice: Convert voice with both style and timbre transfer
	- Timbre: Convert only the timbre of the voice
	- TTS: Generate speech from text with voice cloning

	2. Upload audio files based on mode:
	- Source Audio: Your input audio (for voice and timbre modes)
	- Reference Style: Style reference (for voice and TTS modes)
	- Reference Timbre: Voice reference (required for all modes)

	3. For TTS mode:
	- Enter the text you want to convert to speech
	- Optionally provide reference text
	- Select source and reference languages

	4. Adjust Flow Matching Steps (1-64, default: 32)
	- Higher values give better quality but take longer
	- Lower values are faster but may reduce quality

	5. Click "Generate" to create the converted audio

	## Sample Files

	Sample audio files are available in the `Amphion/models/vc/vevo/wav/` directory:
	- arabic_male.wav
	- source.wav

	## Technical Requirements

	- Python 3.10+
	- CUDA-capable GPU recommended for faster inference
	- Minimum 12GB storage space for models

	## Models

	The application automatically downloads required models from Hugging Face:
	- Content Tokenizer (vq32)
	- Content-Style Tokenizer (vq8192)
	- Autoregressive Transformer
	- Flow Matching Transformer
	- Vocoder