new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Mar 14

Submitted by

akhaliq

Transformers without Normalization

·
5 authors

3

Submitted by

zhoutianyi

CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

·
4 authors

Submitted by

Eliahu

Charting and Navigating Hugging Face's Model Atlas

·
5 authors

4

Submitted by

sinwang

World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning

·
7 authors

6

Submitted by

LucasFang

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

·
12 authors

2

Submitted by

agwmon

Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

·
5 authors

2

Submitted by

Owen777

CoRe^2: Collect, Reflect and Refine to Generate Better and Faster

·
7 authors

Submitted by

Weiyun1025

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

·
15 authors

3

Submitted by

EthanTaylor

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

·
8 authors

2

Submitted by

yeates

OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

·
4 authors

2

Submitted by

wondervictor

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

·
10 authors

2

Submitted by

mozhu

Shifting Long-Context LLMs Research from Input to Output

·
7 authors

2

Submitted by

ChenyangLyu

New Trends for Modern Machine Translation with Large Reasoning Models

·
6 authors

2

Submitted by

wenhu

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

·
7 authors

2

Submitted by

yyf86

DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation

·
9 authors

Submitted by

akhaliq

Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

·
14 authors

Submitted by

akhaliq

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

·
32 authors

Submitted by

VityaVitalich

Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark

·
6 authors

2

Submitted by

RohitGandikota

Distilling Diversity and Control in Diffusion Models

·
2 authors

2

Submitted by

akhaliq

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

·
12 authors

Submitted by

akhaliq

Long Context Tuning for Video Generation

·
8 authors

Submitted by

ArthurDouillard

Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo

·
8 authors

Submitted by

BestWishYsh

CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance

·
10 authors

2

Submitted by

sayakpaul

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

·
9 authors

4

Submitted by

xuxw98

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

·
6 authors

2

Submitted by

hp-l33

Autoregressive Image Generation with Randomized Parallel Decoding

·
4 authors

2

Submitted by

allisonandreyev

Quantization for OpenAI's Whisper Models: A Comparative Analysis

·
1 authors

2

Submitted by

Zc0in

Discovering Influential Neuron Path in Vision Transformers

·
8 authors

2

Submitted by

chenblin26

ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer

·
6 authors

2

Submitted by

AhmadMustafa

On the Limitations of Vision-Language Models in Understanding Image Transforms

·
3 authors

Submitted by

kfirgold99

Piece it Together: Part-Based Concepting with IP-Priors

·
4 authors

Submitted by

gabrielchua

MinorBench: A hand-built benchmark for content-based risks for children

·
3 authors

3

Submitted by

hkchengrex

The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation

·
2 authors

2

Submitted by

jhao

TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention

·
9 authors

Submitted by

imranraad

"Silent Is Not Actually Silent": An Investigation of Toxicity on Bug Report Discussion

·
2 authors

2

Submitted by

xzhao

Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective

·
2 authors

Submitted by

Jason0214

A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1

·
5 authors

Submitted by

Nikolai10

PerCoV2: Improved Ultra-Low Bit-Rate Perceptual Image Compression with Implicit Hierarchical Masked Image Modeling

·
6 authors

2

Submitted by

alandao

PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM

·
4 authors