MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published 7 days ago • 31
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning Paper • 2501.04698 • Published 9 days ago • 14
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning Paper • 2412.11974 • Published Dec 16, 2024 • 9
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 24 days ago • 70
Customized Generation Reimagined: Fidelity and Editability Harmonized Paper • 2412.04831 • Published Dec 6, 2024 • 1
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation Paper • 2412.18176 • Published 25 days ago • 15
A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression Paper • 2412.17483 • Published 26 days ago • 31