Submitted by che111 56 MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning · 9 authors 3
Submitted by zhoutianyi 41 R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts · 3 authors 5
Submitted by JiangYi 26 UniTok: A Unified Tokenizer for Visual Generation and Understanding · 8 authors 2
Submitted by BestWishYsh 25 Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think · 8 authors 3
Submitted by Guizhen 23 FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving · 9 authors 2
Submitted by akhaliq 19 FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute · 10 authors 2
Submitted by shuaishuaicdp 19 CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale · 9 authors 2
Submitted by akhaliq 16 Mobius: Text to Seamless Looping Video Generation via Latent Shift · 7 authors 2
Submitted by OliverRen 14 Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation · 6 authors 2
Submitted by AlignAI 10 Guardians of the Agentic System: Preventing Many Shots Jailbreak with Agentic System · 6 authors 2
Submitted by keanudicap 10 Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance · 9 authors 2
Submitted by mizersy 9 SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning · 6 authors 2
Submitted by thuhsy 8 Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting · 6 authors 2
Submitted by Mihir3009 7 PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving · 14 authors 4
Submitted by akhaliq 7 R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning · 13 authors 2
Submitted by imsuperkong 4 Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling · 3 authors 2