## Usage
0. Use Diffusers backend. `Execution & Models` -> `Execution backend`
1. Go into `Compute Settings`  
2. Enable `Compress Model weights with NNCF` options  
3. Restart the WebUI if it's your first time using NNCF. Otherwise, just reload the model.  

### Features
* Uses INT8, halves the model size  
Saves 3.4 GB of VRAM with SDXL  
* Works in Diffusers backend  
### Disadvantages
* It is Autocast, GPU will still use 16 Bit to run the model and will be slower  
* Uses INT8, can break ControlNet  
* Using Lora will trigger model reload  
* Not implemented in Original backend  
* Fused projections are not compatible with NNCF    


## Options
These results compares NNCF 8 bit to 16 bit.  

- Model:  
  Compresses UNet or Transformers part of the model.  
  This is where the most memory savings happens for Stable Diffusion.  

  SDXL: 2500 MB~ memory savings.  
  SD 1.5: 750 MB~ memory savings.  
  PixArt-XL-2: 600 MB~ memory savings.  

- Text Encoder:  
  Compresses Text Encoder parts of the model.  
  This is where the most memory savings happens for PixArt.  

  PixArt-XL-2: 4750 MB~ memory savings.  
  SDXL: 750 MB~ memory savings.  
  SD 1.5: 120 MB~ memory savings.  

- VAE:  
  Compresses VAE part of the model.  
  Memory savings from compressing VAE is pretty small.  

  SD 1.5 / SDXL / PixArt-XL-2: 75 MB~ memory savings.  

- 4 Bit Compression and Quantization:  
  4 bit compression modes and quantization can be used with OpenVINO backend.  
  For more info: https://github.com/vladmandic/automatic/wiki/OpenVINO#quantization