Xilin Jiang's picture

2 21 1

Xilin Jiang

xi-j

·

xi-j

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information

upvoted a paper 1 day ago

Unified Reward Model for Multimodal Understanding and Generation

authored a paper 14 days ago

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis

View all activity

Organizations

None yet

xi-j's activity

upvoted 2 papers 1 day ago

S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information

Paper • 2503.05085 • Published 5 days ago • 43

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published 5 days ago • 95

upvoted 2 papers 14 days ago

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Paper • 2502.16794 • Published 16 days ago • 5

Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published 21 days ago • 66

upvoted a paper about 1 month ago

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 66

upvoted 3 papers about 2 months ago

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9 • 88

Enhancing Human-Like Responses in Large Language Models

Paper • 2501.05032 • Published Jan 9 • 50

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published Jan 10 • 48

upvoted 3 papers 5 months ago

UniMuMo: Unified Text, Music and Motion Generation

Paper • 2410.04534 • Published Oct 6, 2024 • 19

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171

Presto! Distilling Steps and Layers for Accelerating Music Generation

Paper • 2410.05167 • Published Oct 7, 2024 • 17

upvoted 8 papers 7 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 126

Foundation Models for Music: A Survey

Paper • 2408.14340 • Published Aug 26, 2024 • 44

SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

Paper • 2408.14176 • Published Aug 26, 2024 • 62

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27, 2024 • 41

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 114

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 113

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5, 2024 • 41

Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

Paper • 2408.03900 • Published Aug 7, 2024 • 10

upvoted a paper 8 months ago

Qwen2-Audio Technical Report

Paper • 2407.10759 • Published Jul 15, 2024 • 57