HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published 13 days ago • 87
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published 11 days ago • 70
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published 5 days ago • 44
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 6 days ago • 83
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching Paper • 2412.17153 • Published 16 days ago • 34
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 15 days ago • 28
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published 15 days ago • 41
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published 15 days ago • 44
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response Paper • 2412.14922 • Published 19 days ago • 84
In Case You Missed It: ARC 'Challenge' Is Not That Challenging Paper • 2412.17758 • Published 15 days ago • 16
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding Paper • 2412.18450 • Published 14 days ago • 32
Large Action Models: From Inception to Implementation Paper • 2412.10047 • Published 25 days ago • 32
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 25 days ago • 136