arxiv:2412.03895

A Noise is Worth Diffusion Guidance

Published on Dec 5

· Submitted by

sunovivid on Dec 6

Upvote

Authors:

Jiwon Kang ,

Jaewon Min ,

Hyoungwon Cho ,

Sayak Paul ,

SeonHwa Kim ,

Eunju Cha ,

Kyong Hwan Jin ,

Seungryong Kim

Abstract

Diffusion models excel in generating high-quality images. However, current diffusion models struggle to produce reliable images without guidance methods, such as classifier-free guidance (CFG). Are guidance methods truly necessary? Observing that noise obtained via diffusion inversion can reconstruct high-quality images without guidance, we focus on the initial noise of the denoising pipeline. By mapping Gaussian noise to `guidance-free noise', we uncover that small low-magnitude low-frequency components significantly enhance the denoising process, removing the need for guidance and thus improving both inference throughput and memory. Expanding on this, we propose \ours, a novel method that replaces guidance methods with a single refinement of the initial noise. This refined noise enables high-quality image generation without guidance, within the same diffusion pipeline. Our noise-refining model leverages efficient noise-space learning, achieving rapid convergence and strong performance with just 50K text-image pairs. We validate its effectiveness across diverse metrics and analyze how refined noise can eliminate the need for guidance. See our project page: https://cvlab-kaist.github.io/NoiseRefine/.

View arXiv page View PDF Add to collection

Community

sunovivid

Paper submitter 9 days ago

Diffusion models excel in generating high-quality images. However, current diffusion models struggle to produce reliable images without guidance methods, such as classifier-free guidance (CFG). Are guidance methods truly necessary? Observing that noise obtained via diffusion inversion can reconstruct high-quality images without guidance, we focus on the initial noise of the denoising pipeline. By mapping Gaussian noise to 'guidance-free noise', we uncover that low-magnitude, low-frequency components significantly enhance the denoising process, removing the need for guidance. Expanding on this, we propose NoiseRefine, a novel method that replaces guidance methods with a single refinement of the initial noise. This refined noise enables high-quality image generation without guidance, within the same diffusion pipeline. Our noise-refining model leverages efficient noise-space learning, achieving rapid convergence and strong performance with just 50K text-image pairs. We validate its effectiveness across diverse metrics and analyze how refined noise eliminates the need for guidance.