ChrisGoringe
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -11,8 +11,10 @@ base_model: black-forest-labs/FLUX.1-dev
|
|
11 |
|
12 |
A collection of GGUF models using mixed quantization (different layers quantized to different precision to optimise fidelity v. memory).
|
13 |
|
14 |
-
They
|
15 |
-
|
|
|
|
|
16 |
|
17 |
## Naming convention (mx for 'mixed')
|
18 |
|
@@ -30,11 +32,13 @@ where NN_N is the approximate reduction in VRAM usage compared the full 16 bit v
|
|
30 |
|
31 |
The process for optimisation is as follows:
|
32 |
|
33 |
-
- 240 prompts used for flux images popular at civit.ai were run through the full Flux.1-dev model
|
34 |
-
-
|
35 |
-
-
|
36 |
-
-
|
37 |
-
-
|
|
|
|
|
38 |
- An optimised quantization is one that gives the desired reduction in size for the smallest total cost
|
39 |
- A series of recipies for optimization have been created from the calculated costs
|
40 |
- the various 'in' blocks, the final layer blocks, and all normalization scale parameters are stored in float32
|
|
|
11 |
|
12 |
A collection of GGUF models using mixed quantization (different layers quantized to different precision to optimise fidelity v. memory).
|
13 |
|
14 |
+
They were created using the [convert.py script](https://github.com/chrisgoringe/mixed-gguf-converter).
|
15 |
+
|
16 |
+
They can be loaded in ComfyUI using the [ComfyUI GGUF Nodes](https://github.com/city96/ComfyUI-GGUF). Just put the gguf files in your
|
17 |
+
models/unet directory.
|
18 |
|
19 |
## Naming convention (mx for 'mixed')
|
20 |
|
|
|
32 |
|
33 |
The process for optimisation is as follows:
|
34 |
|
35 |
+
- 240 prompts used for flux images popular at civit.ai were run through the full Flux.1-dev model with randomised resolution and step count.
|
36 |
+
- For a randomly selected step in the inference, the hidden states before and after the layer stack were captured.
|
37 |
+
- For each layer in turn, and for each of the Q8_0, Q5_1 and Q4_1 quantizations:
|
38 |
+
- A single layer was quantized
|
39 |
+
- The initial hidden states were processed by the modified layer stack
|
40 |
+
- The error (MSE) in the final hidden state was calculated
|
41 |
+
- This gives a 'cost' for each possible layer quantization
|
42 |
- An optimised quantization is one that gives the desired reduction in size for the smallest total cost
|
43 |
- A series of recipies for optimization have been created from the calculated costs
|
44 |
- the various 'in' blocks, the final layer blocks, and all normalization scale parameters are stored in float32
|