Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like Paper • 2402.07383 • Published Feb 12 • 13
Matcha-TTS: A fast TTS architecture with conditional flow matching Paper • 2309.03199 • Published Sep 6, 2023 • 11
Natural language guidance of high-fidelity text-to-speech with synthetic annotations Paper • 2402.01912 • Published Feb 2 • 11
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published Jun 4 • 30
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes Paper • 2406.02897 • Published Jun 5 • 13
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers Paper • 2406.05370 • Published Jun 8 • 14
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS Paper • 2406.18009 • Published Jun 26 • 19
Towards Robust Speech Representation Learning for Thousands of Languages Paper • 2407.00837 • Published Jun 30 • 10
Autoregressive Speech Synthesis without Vector Quantization Paper • 2407.08551 • Published Jul 11 • 14
Efficient Audio Captioning with Encoder-Level Knowledge Distillation Paper • 2407.14329 • Published Jul 19 • 4
Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis Paper • 2407.09732 • Published Jul 13 • 8
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer Paper • 2308.06873 • Published Aug 14, 2023 • 25
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency Paper • 2408.04708 • Published Aug 8 • 5
Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos Paper • 2408.10998 • Published Aug 20 • 8
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization Paper • 2408.08019 • Published Aug 15 • 10
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation Paper • 2408.07547 • Published Aug 14 • 7
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold Paper • 2408.14608 • Published Aug 26 • 7
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models Paper • 2407.02687 • Published Jul 2 • 22
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Paper • 2408.16725 • Published Aug 29 • 52