F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)
Generate realistic voice audio from text and audio prompts
An end-to-end (e2e) Voice Language Model by Fish Audio.
Vote on the latest TTS models!
KE-Omni
Interact with a multimodal chatbot using text and audio
Interact with images using text prompts