Collections
Discover the best community collections!
Collections including paper arxiv:2406.03344
-
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 18 -
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Paper • 2406.07522 • Published • 36 -
VSSD: Vision Mamba with Non-Casual State Space Duality
Paper • 2407.18559 • Published • 17
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 10 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 13 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 18
-
Music Consistency Models
Paper • 2404.13358 • Published • 12 -
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Paper • 2407.02869 • Published • 18 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 13 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 18
-
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 188 -
MusicHiFi: Fast High-Fidelity Stereo Vocoding
Paper • 2403.10493 • Published • 16 -
Music Consistency Models
Paper • 2404.13358 • Published • 12 -
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Paper • 2406.02430 • Published • 29