arxiv:2411.17991
Yuxuan Wang
ColorfulAI
AI & ML interests
Multimodal Learning
Recent Activity
authored
a paper
5 days ago
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language
Models
commented
a paper
25 days ago
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video
Comprehension with Video-Text Duet Interaction Format