Steering Rectified Flow Models in the Vector Field for Controlled Image Generation
Abstract
Diffusion models (DMs) excel in photorealism, image editing, and solving inverse problems, aided by classifier-free guidance and image inversion techniques. However, rectified flow models (RFMs) remain underexplored for these tasks. Existing DM-based methods often require additional training, lack generalization to pretrained latent models, underperform, and demand significant computational resources due to extensive backpropagation through ODE solvers and inversion processes. In this work, we first develop a theoretical and empirical understanding of the vector field dynamics of RFMs in efficiently guiding the denoising trajectory. Our findings reveal that we can navigate the vector field in a deterministic and gradient-free manner. Utilizing this property, we propose FlowChef, which leverages the vector field to steer the denoising trajectory for controlled image generation tasks, facilitated by gradient skipping. FlowChef is a unified framework for controlled image generation that, for the first time, simultaneously addresses classifier guidance, linear inverse problems, and image editing without the need for extra training, inversion, or intensive backpropagation. Finally, we perform extensive evaluations and show that FlowChef significantly outperforms baselines in terms of performance, memory, and time requirements, achieving new state-of-the-art results. Project Page: https://flowchef.github.io.
Community
Project Page: https://flowchef.github.io/
Demo FlowChef + Flux (Image Editing + Inverse Problem): Link
Demo FlowChef + InstaFlow (Image Editing): Link
Demo FlowChef + InstaFlow (Inpainting): Link
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Taming Rectified Flow for Inversion and Editing (2024)
- Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations (2024)
- SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis (2024)
- PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing (2024)
- DiT4Edit: Diffusion Transformer for Image Editing (2024)
- FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation (2024)
- VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper