HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published 8 days ago • 78
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models Paper • 2412.18609 • Published 9 days ago • 13
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation Paper • 2412.18176 • Published 9 days ago • 15
Revisiting In-Context Learning with Long Context Language Models Paper • 2412.16926 • Published 11 days ago • 23
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 17 days ago • 41
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 15 days ago • 115
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 20 days ago • 80
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published 21 days ago • 43
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper • 2412.08737 • Published 22 days ago • 52
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published 21 days ago • 92
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities Paper • 2412.07769 • Published 23 days ago • 26
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 20 days ago • 134
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published 28 days ago • 105
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published 27 days ago • 121