This repository contains alternative or tuned versions of Stable Diffusion XL Base 1.0 in .safetensors format.

Available Models

sd_xl_base_1.0_fp16_vae.safetensors

This file contains the weights of sd_xl_base_1.0.safetensors, merged with the weights of sdxl_vae.safetensors from MadeByOllin's SDXL FP16 VAE repository.

sd_xl_base_1.0_inpainting_0.1.safetensors

This file contains the weights of sd_xl_base_1.0_fp16_vae.safetensors merged with the weights from diffusers/stable-diffusion-xl-1.0-inpainting-0.1.

How to Create an SDXL Inpainting Checkpoint from any SDXL Checkpoint

Using the .safetensors files here, you can calculate an inpainting model using the formula A + (B - C), where:

A is sd_xl_base_1.0_inpainting_0.1.safetensors
B is your fine-tuned checkpoint
C is sd_xl_base_1.0_fp16_vae.safetensors

Using ENFUGUE's Web UI:

You must specifically use the two files present in this repository for this to work. The Diffusers team trained XL Inpainting using FP16 XL VAE, so using a different XL base will result in an incorrect delta being applied to the inpainting checkpoint, and the resulting VAE will be nonsensical.

Model Description

Developed by: The Diffusers team
Repackaged by: Benjamin Paine
Model type: Diffusion-based text-to-image generative model
License: CreativeML Open RAIL++-M License
Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).

Uses

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include

Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.
Research on generative models.
Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.
Excluded uses are described below.

Out-of-Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

Limitations and Bias

Limitations

The model does not achieve perfect photorealism
The model cannot render legible text
The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
Faces and people in general may not be generated properly.
The autoencoding part of the model is lossy.
When the strength parameter is set to 1 (i.e. starting in-painting from a fully masked image), the quality of the image is degraded. The model retains the non-masked contents of the image, but images look less sharp. We're investing this and working on the next version.

Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.

benjamin-paine
/

sd-xl-alternative-bases