S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information Paper • 2503.05085 • Published 5 days ago • 43
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published 5 days ago • 95
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding Paper • 2502.16794 • Published 16 days ago • 5
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published 21 days ago • 66
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10 • 48
Presto! Distilling Steps and Layers for Accelerating Music Generation Paper • 2410.05167 • Published Oct 7, 2024 • 17
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 126
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher Paper • 2408.14176 • Published Aug 26, 2024 • 62
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper • 2408.15237 • Published Aug 27, 2024 • 41
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond Paper • 2408.03900 • Published Aug 7, 2024 • 10