To ease textual image editing, we present LEDITS++, a novel method for efficient and versatile image
editing using text-to-image diffusion models. Firstly, LEDITS++ sets itself apart as a parameter-free
solution requiring no fine-tuning nor any optimization. We derive characteristics of an edit-friendly
noise space with a perfect input reconstruction, which were previously proposed for the DDPM
sampling scheme, for a significantly faster multistep stochastic differential-equation (SDE)
solver. This novel invertibility of the DPM-solver++ facilitates editing with LEDITS++ in as
little as 20 total diffusion steps for inversion and inference combined.
Moreover, LEDITS++ places a strong emphasis on semantic grounding to enhance the visual and
contextual coherence of the edits. This ensures that changes are limited to the relevant regions in the
image, preserving the original image’s fidelity as much as possible. LEDITS++ also provides users
with the flexibility to combine multiple edits seamlessly, opening up new creative possibilities for
intricate image manipulations. Finally, the approach is architecture-agnostic and compatible with any
diffusion model, whether latent or pixel-based.
Methodology
The methodology of LEDITS++ can be broken down into three components: (1) efficient image
inversion, (2) versatile textual editing, and (3) semantic grounding of image changes.