Diffusers documentation

UniPCMultistepScheduler

You are viewing v0.29.2 version. A newer version v0.31.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

UniPCMultistepScheduler

UniPCMultistepScheduler is a training-free framework designed for fast sampling of diffusion models. It was introduced in UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models by Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, Jiwen Lu.

It consists of a corrector (UniC) and a predictor (UniP) that share a unified analytical form and support arbitrary orders. UniPC is by design model-agnostic, supporting pixel-space/latent-space DPMs on unconditional/conditional sampling. It can also be applied to both noise prediction and data prediction models. The corrector UniC can be also applied after any off-the-shelf solvers to increase the order of accuracy.

The abstract from the paper is:

Diffusion probabilistic models (DPMs) have demonstrated a very promising ability in high-resolution image synthesis. However, sampling from a pre-trained DPM is time-consuming due to the multiple evaluations of the denoising network, making it more and more important to accelerate the sampling of DPMs. Despite recent progress in designing fast samplers, existing methods still cannot generate satisfying images in many applications where fewer steps (e.g., <10) are favored. In this paper, we develop a unified corrector (UniC) that can be applied after any existing DPM sampler to increase the order of accuracy without extra model evaluations, and derive a unified predictor (UniP) that supports arbitrary order as a byproduct. Combining UniP and UniC, we propose a unified predictor-corrector framework called UniPC for the fast sampling of DPMs, which has a unified analytical form for any order and can significantly improve the sampling quality over previous methods, especially in extremely few steps. We evaluate our methods through extensive experiments including both unconditional and conditional sampling using pixel-space and latent-space DPMs. Our UniPC can achieve 3.87 FID on CIFAR10 (unconditional) and 7.51 FID on ImageNet 256×256 (conditional) with only 10 function evaluations. Code is available at this https URL.

Tips

It is recommended to set solver_order to 2 for guide sampling, and solver_order=3 for unconditional sampling.

Dynamic thresholding from Imagen is supported, and for pixel-space diffusion models, you can set both predict_x0=True and thresholding=True to use dynamic thresholding. This thresholding method is unsuitable for latent-space diffusion models such as Stable Diffusion.

UniPCMultistepScheduler

class diffusers.UniPCMultistepScheduler

< >

( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: Union = None solver_order: int = 2 prediction_type: str = 'epsilon' thresholding: bool = False dynamic_thresholding_ratio: float = 0.995 sample_max_value: float = 1.0 predict_x0: bool = True solver_type: str = 'bh2' lower_order_final: bool = True disable_corrector: List = [] solver_p: SchedulerMixin = None use_karras_sigmas: Optional = False timestep_spacing: str = 'linspace' steps_offset: int = 0 final_sigmas_type: Optional = 'zero' rescale_betas_zero_snr: bool = False )

Parameters

  • num_train_timesteps (int, defaults to 1000) — The number of diffusion steps to train the model.
  • beta_start (float, defaults to 0.0001) — The starting beta value of inference.
  • beta_end (float, defaults to 0.02) — The final beta value.
  • beta_schedule (str, defaults to "linear") — The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear, scaled_linear, or squaredcos_cap_v2.
  • trained_betas (np.ndarray, optional) — Pass an array of betas directly to the constructor to bypass beta_start and beta_end.
  • solver_order (int, default 2) — The UniPC order which can be any positive integer. The effective order of accuracy is solver_order + 1 due to the UniC. It is recommended to use solver_order=2 for guided sampling, and solver_order=3 for unconditional sampling.
  • prediction_type (str, defaults to epsilon, optional) — Prediction type of the scheduler function; can be epsilon (predicts the noise of the diffusion process), sample (directly predicts the noisy sample) or v_prediction` (see section 2.4 of Imagen Video paper).
  • thresholding (bool, defaults to False) — Whether to use the “dynamic thresholding” method. This is unsuitable for latent-space diffusion models such as Stable Diffusion.
  • dynamic_thresholding_ratio (float, defaults to 0.995) — The ratio for the dynamic thresholding method. Valid only when thresholding=True.
  • sample_max_value (float, defaults to 1.0) — The threshold value for dynamic thresholding. Valid only when thresholding=True and predict_x0=True.
  • predict_x0 (bool, defaults to True) — Whether to use the updating algorithm on the predicted x0.
  • solver_type (str, default bh2) — Solver type for UniPC. It is recommended to use bh1 for unconditional sampling when steps < 10, and bh2 otherwise.
  • lower_order_final (bool, default True) — Whether to use lower-order solvers in the final steps. Only valid for < 15 inference steps. This can stabilize the sampling of DPMSolver for steps < 15, especially for steps <= 10.
  • disable_corrector (list, default []) — Decides which step to disable the corrector to mitigate the misalignment between epsilon_theta(x_t, c) and epsilon_theta(x_t^c, c) which can influence convergence for a large guidance scale. Corrector is usually disabled during the first few steps.
  • solver_p (SchedulerMixin, default None) — Any other scheduler that if specified, the algorithm becomes solver_p + UniC.
  • use_karras_sigmas (bool, optional, defaults to False) — Whether to use Karras sigmas for step sizes in the noise schedule during the sampling process. If True, the sigmas are determined according to a sequence of noise levels {σi}.
  • timestep_spacing (str, defaults to "linspace") — The way the timesteps should be scaled. Refer to Table 2 of the Common Diffusion Noise Schedules and Sample Steps are Flawed for more information.
  • steps_offset (int, defaults to 0) — An offset added to the inference steps, as required by some model families.
  • final_sigmas_type (str, defaults to "zero") — The final sigma value for the noise schedule during the sampling process. If "sigma_min", the final sigma is the same as the last sigma in the training schedule. If zero, the final sigma is set to 0.
  • rescale_betas_zero_snr (bool, defaults to False) — Whether to rescale the betas to have zero terminal SNR. This enables the model to generate very bright and dark samples instead of limiting it to samples with medium brightness. Loosely related to --offset_noise.

UniPCMultistepScheduler is a training-free framework designed for the fast sampling of diffusion models.

This model inherits from SchedulerMixin and ConfigMixin. Check the superclass documentation for the generic methods the library implements for all schedulers such as loading and saving.

convert_model_output

< >

( model_output: Tensor *args sample: Tensor = None **kwargs ) torch.Tensor

Parameters

  • model_output (torch.Tensor) — The direct output from the learned diffusion model.
  • timestep (int) — The current discrete timestep in the diffusion chain.
  • sample (torch.Tensor) — A current instance of a sample created by the diffusion process.

Returns

torch.Tensor

The converted model output.

Convert the model output to the corresponding type the UniPC algorithm needs.

multistep_uni_c_bh_update

< >

( this_model_output: Tensor *args last_sample: Tensor = None this_sample: Tensor = None order: int = None **kwargs ) torch.Tensor

Parameters

  • this_model_output (torch.Tensor) — The model outputs at x_t.
  • this_timestep (int) — The current timestep t.
  • last_sample (torch.Tensor) — The generated sample before the last predictor x_{t-1}.
  • this_sample (torch.Tensor) — The generated sample after the last predictor x_{t}.
  • order (int) — The p of UniC-p at this step. The effective order of accuracy should be order + 1.

Returns

torch.Tensor

The corrected sample tensor at the current timestep.

One step for the UniC (B(h) version).

multistep_uni_p_bh_update

< >

( model_output: Tensor *args sample: Tensor = None order: int = None **kwargs ) torch.Tensor

Parameters

  • model_output (torch.Tensor) — The direct output from the learned diffusion model at the current timestep.
  • prev_timestep (int) — The previous discrete timestep in the diffusion chain.
  • sample (torch.Tensor) — A current instance of a sample created by the diffusion process.
  • order (int) — The order of UniP at this timestep (corresponds to the p in UniPC-p).

Returns

torch.Tensor

The sample tensor at the previous timestep.

One step for the UniP (B(h) version). Alternatively, self.solver_p is used if is specified.

scale_model_input

< >

( sample: Tensor *args **kwargs ) torch.Tensor

Parameters

  • sample (torch.Tensor) — The input sample.

Returns

torch.Tensor

A scaled input sample.

Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.

set_begin_index

< >

( begin_index: int = 0 )

Parameters

  • begin_index (int) — The begin index for the scheduler.

Sets the begin index for the scheduler. This function should be run from pipeline before the inference.

set_timesteps

< >

( num_inference_steps: int device: Union = None )

Parameters

  • num_inference_steps (int) — The number of diffusion steps used when generating samples with a pre-trained model.
  • device (str or torch.device, optional) — The device to which the timesteps should be moved to. If None, the timesteps are not moved.

Sets the discrete timesteps used for the diffusion chain (to be run before inference).

step

< >

( model_output: Tensor timestep: int sample: Tensor return_dict: bool = True ) SchedulerOutput or tuple

Parameters

  • model_output (torch.Tensor) — The direct output from learned diffusion model.
  • timestep (int) — The current discrete timestep in the diffusion chain.
  • sample (torch.Tensor) — A current instance of a sample created by the diffusion process.
  • return_dict (bool) — Whether or not to return a SchedulerOutput or tuple.

Returns

SchedulerOutput or tuple

If return_dict is True, SchedulerOutput is returned, otherwise a tuple is returned where the first element is the sample tensor.

Predict the sample from the previous timestep by reversing the SDE. This function propagates the sample with the multistep UniPC.

SchedulerOutput

class diffusers.schedulers.scheduling_utils.SchedulerOutput

< >

( prev_sample: Tensor )

Parameters

  • prev_sample (torch.Tensor of shape (batch_size, num_channels, height, width) for images) — Computed sample (x_{t-1}) of previous timestep. prev_sample should be used as next model input in the denoising loop.

Base class for the output of a scheduler’s step function.

< > Update on GitHub