Submitted by ykim362 58 Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs · 73 authors 6
Submitted by jayw 35 Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models · 9 authors 2
Submitted by obiwan96 23 Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs · 5 authors 3
Submitted by zlzheng 22 From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens · 5 authors 2
Submitted by multimodalart 22 DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion · 8 authors 2
Submitted by XiaohuanZhou 20 OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment · 8 authors 2
Submitted by aigoncharov 18 When an LLM is apprehensive about its answers -- and when its uncertainty is justified · 5 authors 2
Submitted by weigao266 13 Liger: Linearizing Large Language Models to Gated Recurrent Structures · 5 authors 2
Submitted by qian 11 Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions · 12 authors 2
Submitted by KaiLv 9 DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting · 4 authors 2
Submitted by DeyangKong 8 SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity · 10 authors 2
Submitted by LTT 7 Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation · 10 authors 2
Submitted by Elfsong 6 CodeArena: A Collective Evaluation Platform for LLM Code Generation · 8 authors 2
Submitted by Ziruibest 5 Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia · 6 authors 2
Submitted by WenhaoWang 5 VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation · 2 authors 2
Submitted by hanseungwook 4 General Reasoning Requires Learning to Reason from the Get-go · 4 authors 2
Submitted by dnoever 4 AI-Invented Tonal Languages: Preventing a Machine Lingua Franca Beyond Human Understanding · 1 authors 2
Submitted by SP4595 3 CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments · 10 authors 2
Submitted by jaehong31 2 RSQ: Learning from Important Tokens Leads to Better Quantized LLMs · 5 authors 3
Submitted by jiwan-chung 2 Teaching Metric Distance to Autoregressive Multimodal Foundational Models · 6 authors 2
Submitted by worstcoder 2 Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator · 7 authors 2
Submitted by yxuan 2 Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model · 6 authors 2
Submitted by RandomHakkaDude 1 Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis · 5 authors 2