Slow Perception: Let's Perceive Geometric Figures Step-by-step Paper • 2412.20631 • Published 5 days ago • 12
OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System Paper • 2412.20005 • Published 6 days ago • 13
Training Software Engineering Agents and Verifiers with SWE-Gym Paper • 2412.21139 • Published 4 days ago • 15
PERSE: Personalized 3D Generative Avatars from A Single Portrait Paper • 2412.21206 • Published 4 days ago • 14
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper • 2412.21037 • Published 4 days ago • 20
Efficiently Serving LLM Reasoning Programs with Certaindex Paper • 2412.20993 • Published 4 days ago • 28
Bringing Objects to Life: 4D generation from 3D objects Paper • 2412.20422 • Published 5 days ago • 31
On the Compositional Generalization of Multimodal LLMs for Medical Imaging Paper • 2412.20070 • Published 6 days ago • 39
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 10 days ago • 59
Are Vision-Language Models Truly Understanding Multi-vision Sensor? Paper • 2412.20750 • Published 4 days ago • 12
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published 7 days ago • 60
CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era Paper • 2412.18702 • Published 10 days ago • 5
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging Paper • 2412.19512 • Published 7 days ago • 8
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models Paper • 2412.19645 • Published 7 days ago • 13
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition Paper • 2412.19712 • Published 7 days ago • 14
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models Paper • 2412.18605 • Published 10 days ago • 17
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper • 2412.19326 • Published 8 days ago • 17
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published 18 days ago • 44