new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Mar 4

Submitted by

ykim362

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

·
73 authors

Submitted by

Zery

Visual-RFT: Visual Reinforcement Fine-Tuning

·
8 authors

Submitted by

jayw

Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

·
9 authors

Submitted by

obiwan96

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

·
5 authors

Submitted by

zlzheng

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

·
5 authors

Submitted by

multimodalart

DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion

·
8 authors

Submitted by

XiaohuanZhou

OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment

·
8 authors

Submitted by

aigoncharov

When an LLM is apprehensive about its answers -- and when its uncertainty is justified

·
5 authors

Submitted by

weigao266

Liger: Linearizing Large Language Models to Gated Recurrent Structures

·
5 authors

Submitted by

Haoyu0529

Speculative Ad-hoc Querying

·
5 authors

Submitted by

ChengsongHuang

Efficient Test-Time Scaling via Self-Calibration

·
5 authors

Submitted by

qian

Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions

·
12 authors

Submitted by

hamishivi

Large-Scale Data Selection for Instruction Tuning

·
5 authors

Submitted by

KaiLv

DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting

·
4 authors

Submitted by

DeyangKong

SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity

·
10 authors

Submitted by

LTT

Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation

·
10 authors

Submitted by

Yogurt928

PodAgent: A Comprehensive Framework for Podcast Generation

·
5 authors

Submitted by

Elfsong

CodeArena: A Collective Evaluation Platform for LLM Code Generation

·
8 authors

Submitted by

Ziruibest

Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia

·
6 authors

Submitted by

WenhaoWang

VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation

·
2 authors

Submitted by

hanseungwook

General Reasoning Requires Learning to Reason from the Get-go

·
4 authors

Submitted by

dnoever

AI-Invented Tonal Languages: Preventing a Machine Lingua Franca Beyond Human Understanding

·
1 authors

Submitted by

SP4595

CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments

·
10 authors

Submitted by

jaehong31

RSQ: Learning from Important Tokens Leads to Better Quantized LLMs

·
5 authors

Submitted by

jiwan-chung

Teaching Metric Distance to Autoregressive Multimodal Foundational Models

·
6 authors

Submitted by

worstcoder

Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator

·
7 authors

Submitted by

yxuan

Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model

·
6 authors

Submitted by

RandomHakkaDude

Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis

·
5 authors