File size: 5,976 Bytes
1cae86c b36cd2b a3cbf7f 4bc0828 a3cbf7f 220e2bb 934e466 220e2bb aa71495 4bc0828 bf27cdb 4bc0828 985e0c2 00d15ed e8bd1d3 1de7f71 4822f90 4386e4e 4822f90 4bc0828 985e0c2 1de7f71 4cb1152 0d5d5c2 bbe2c11 985e0c2 4bc0828 4822f90 1de7f71 6cc3e14 985e0c2 1de7f71 985e0c2 bbe2c11 985e0c2 6cc3e14 be70377 6cc3e14 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
---
license: other
license_name: sacla
license_link: >-
https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo/blob/main/LICENSE.md
base_model:
- stabilityai/stable-diffusion-3.5-large-turbo
base_model_relation: quantized
---
## Overview
These models are made to work with [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) release [master-ac54e00](https://github.com/leejet/stable-diffusion.cpp/releases/tag/master-ac54e00) onwards. Support for other inference backends is not guarenteed.
Quantized using this PR https://github.com/leejet/stable-diffusion.cpp/pull/447
Normal K-quants are not working properly with SD3.5-Large models because around 90% of the weights are in tensors whose shape doesn't match the 256 superblock size of K-quants and therefore can't be quantized this way. Mixing quantization types allows us to take adventage of the better fidelity of k-quants to some extent while keeping the model file size relatively small.
Only the second layers of both MLPs in each MMDiT block of SD3.5 Large models have the correct shape to be compatible with k-quants. That still makes up for about 10% of all the parameters.
## Files:
### Non-Linear Type:
- [sd3.5_large_turbo-iq4_nl.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-iq4_nl.gguf): Same size as q4_k_4_0 and q4_0, runs faster than q4_k_4_0 (on Vulkan at least), and provides better image quality. Recommended
### Mixed Types:
- [sd3.5_large_turbo-q2_k_4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/sd3.5_large_turbo-q2_k_4_0.gguf): Smallest quantization yet. Use this if you can't afford anything bigger
- [sd3.5_large_turbo-q3_k_4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/sd3.5_large_turbo-q3_k_4_0.gguf): Smaller than q4_0, acceptable degradation.
- [sd3.5_large_turbo-q4_k_4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/sd3.5_large_turbo-q4_k_4_0.gguf): Exacty same size as q4_0 and iq4_nl, I recommend using iq4_nl instead.
- [sd3.5_large_turbo-q4_k_4_1.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/sd3.5_large_turbo-q4_k_4_1.gguf): Smaller than q4_1, and with comparable degradation. Recommended
- [sd3.5_large_turbo-q4_k_5_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/sd3.5_large_turbo-q4_k_5_0.gguf): Smaller than q5_0, and with comparable degradation. Very close to the original f16 already. Recommended
### Legacy types:
- [sd3.5_large_turbo-q4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q4_0.gguf): Same size as q4_k_4_0, Not recommended (use iqk_nl q4_k_4_0 instead)
- [sd3.5_large_turbo-q4_1.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q4_1.gguf): Not recommended (q4_k_4_1 is better and smaller)
- [sd3.5_large_turbo-q5_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q5_0.gguf): Barely better and bigger than q4_k_5_0
- [sd3.5_large_turbo-q5_1.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q5_1.gguf): Better and bigger than q5_0
- [sd3.5_large_turbo-q8_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q8_0.gguf): Basically indistinguishable from the original f16, but much smaller. Recommended for best quality
## Outputs:
Sorted by model size (Note that q4_0, q4_k_4_0, and iq4_nl are the exact same size)
| Quantization | Robot girl | Text | Cute kitten |
| ------------------ | -------------------------------- | ---------------------------------- | ---------------------------------- |
| q2_k_4_0 | ![q2_k_4_0](Images/q2_k_4_0.png) | ![q2_k_4_0](Images/1_q2_k_4_0.png) | ![q2_k_4_0](Images/2_q2_k_4_0.png) |
| q3_k_4_0 | ![q3_k_4_0](Images/q3_k_4_0.png) | ![q3_k_4_0](Images/1_q3_k_4_0.png) | ![q3_k_4_0](Images/2_q3_k_4_0.png) |
| q4_0 | ![q4_0](Images/q4_0.png) | ![q4_0](Images/1_q4_0.png) | ![q4_0](Images/2_q4_0.png) |
| q4_k_4_0 | ![q4_k_4_0](Images/q4_k_4_0.png) | ![q4_k_4_0](Images/1_q4_k_4_0.png) | ![q4_k_4_0](Images/2_q4_k_4_0.png) |
| iq4_nl | ![iq4_nl](Images/iq4_nl.png) | ![iq4_nl](Images/1_iq4_nl.png) | ![iq4_nl](Images/2_iq4_nl.png) |
| q4_k_4_1 | ![q4_k_4_1](Images/q4_k_4_1.png) | ![q4_k_4_1](Images/1_q4_k_4_1.png) | ![q4_k_4_1](Images/2_q4_k_4_1.png) |
| q4_1 | ![q4_1](Images/q4_1.png) | ![q4_1](Images/1_q4_1.png) | ![q4_1](Images/2_q4_1.png) |
| q4_k_5_0 | ![q4_k_5_0](Images/q4_k_5_0.png) | ![q4_k_5_0](Images/1_q4_k_5_0.png) | ![q4_k_5_0](Images/2_q4_k_5_0.png) |
| q5_0 | ![q5_0](Images/q5_0.png) | ![q5_0](Images/1_q5_0.png) | ![q5_0](Images/2_q5_0.png) |
| q5_1 | ![q5_1](Images/q5_1.png) | ![q5_1](Images/1_q5_1.png) | ![q5_1](Images/2_q5_1.png) |
| q8_0 | ![q8_0](Images/q8_0.png) | ![q8_0](Images/1_q8_0.png) | ![q8_0](Images/2_q8_0.png) |
| f16(sft) | ![f16](Images/f16.png) | ![f16](Images/1_f16.png) | ![f16](Images/2_f16.png) |
Generated with a modified version of sdcpp with [this PR](https://github.com/leejet/stable-diffusion.cpp/pull/397) applied to enable clip timestep embeddings support.
Text encoders used: q4_k quant of t5xxl, full precision clip_g, and q8 quant of [ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14) in place of clip_l.
Full prompts and settings in png metadata.
|