--- license: mit --- # CleanDIFT Model Card Diffusion models learn powerful world representations that have proven valuable for tasks like semantic correspondence detection, depth estimation, semantic segmentation, and classification. However, diffusion models require noisy input images, which destroys information and introduces the noise level as a hyperparameter that needs to be tuned for each task. We introduce CleanDIFT, a novel method to extract noise-free, timestep-independent features by enabling diffusion models to work directly with clean input images. The approach is efficient, training on a single GPU in just 30 minutes. We publish these models alongside our paper ["CleanDIFT: Diffusion Features without Noise"](https://compvis.github.io/CleanDIFT/). We provide checkpoints for Stable Diffusion 1.5 and Stable Diffusion 2.1. ## Usage For detailed examples on how to extract features with CleanDIFT and how to use them for downstream tasks, please refer to the notebooks provided [here](https://github.com/CompVis/CleanDIFT/tree/main/notebooks). Our checkpoints are fully compatible with the `diffusers` library. If you already have a pipeline using SD 1.5 or SD 2.1 from `diffusers`, you can simply replace the U-Net state dict: ```python from diffusers import UNet2DConditionModel from huggingface_hub import hf_hub_download unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-2-1", subfolder="unet") ckpt_pth = hf_hub_download(repo_id="CompVis/cleandift", filename="cleandift_sd21_unet.safetensors") state_dict = load_file(ckpt_pth) unet.load_state_dict(state_dict, strict=True) ```