Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.16.0
Usage
- Use Diffusers backend.
Execution & Models
->Execution backend
- Go into
Compute Settings
- Enable
Compress Model weights with NNCF
options - Restart the WebUI if it's your first time using NNCF. Otherwise, just reload the model.
Features
- Uses INT8, halves the model size
Saves 3.4 GB of VRAM with SDXL - Works in Diffusers backend
Disadvantages
- It is Autocast, GPU will still use 16 Bit to run the model and will be slower
- Uses INT8, can break ControlNet
- Using Lora will trigger model reload
- Not implemented in Original backend
- Fused projections are not compatible with NNCF
Options
These results compares NNCF 8 bit to 16 bit.
Model:
Compresses UNet or Transformers part of the model.
This is where the most memory savings happens for Stable Diffusion.SDXL: 2500 MB~ memory savings.
SD 1.5: 750 MB~ memory savings.
PixArt-XL-2: 600 MB~ memory savings.Text Encoder:
Compresses Text Encoder parts of the model.
This is where the most memory savings happens for PixArt.PixArt-XL-2: 4750 MB~ memory savings.
SDXL: 750 MB~ memory savings.
SD 1.5: 120 MB~ memory savings.VAE:
Compresses VAE part of the model.
Memory savings from compressing VAE is pretty small.SD 1.5 / SDXL / PixArt-XL-2: 75 MB~ memory savings.
4 Bit Compression and Quantization:
4 bit compression modes and quantization can be used with OpenVINO backend.
For more info: https://github.com/vladmandic/automatic/wiki/OpenVINO#quantization