Quantization scripts
Could you share the scripts you used to quantize both transformer
and text_encoder2
, as i want to reproduce it using different merged flux checkpoint.
Thanks in advance!
This is the fastest code I have tried so far. 30 seconds to generate 1024x1024 on a RTX 3080. That's faster than SDXL and many times better quality. Pretty amazing really. I think @HighCWu has something here. Could probably use some way to add LORAs. GGUF support would be really awesome.
Could you share the scripts you used to quantize both
transformer
andtext_encoder2
, as i want to reproduce it using different merged flux checkpoint.Thanks in advance!
You can load any transformer with this, just re-use @HighCWu 's text_encoder_2:
from diffusers import FluxPipeline
flux = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
transformer=None,
text_encoder_2=None,
torch_dtype=torch.bfloat16,
)
from model import T5EncoderModel as T5EncoderModel #better to run the non-quantized version of this if you can
text_encoder_2: T5EncoderModel = T5EncoderModel.from_pretrained(
"HighCWu/FLUX.1-dev-4bit",
subfolder="text_encoder_2",
torch_dtype=torch.bfloat16,
)
flux.text_encoder_2 = text_encoder_2
model_id = "your other flux model" # <---------------- any flux model
from model import FluxTransformer2DModel as FluxTransformer2DModel #HighCWu's transformer class
transformer: FluxTransformer2DModel = FluxTransformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
torch_dtype=torch.bfloat16,
load_in_4bit=True,
)
flux.transformer=transformer
flux.enable_model_cpu_offload()
ok I got it working, thanks for the hints
- you need to convert the fp8 model to diffusers format https://github.com/huggingface/diffusers/blob/main/scripts/convert_flux_to_diffusers.py this may require adding "model.diffusion_model" in each key before mapping, make sure to save it in bf16, i tried fp8 formats, they are not compatible
- you need to load it with this codebase, passing quantization_config (BitsAndBytesConfig) to FluxTransformer2DModel.from_pretrained
- save the model with .save_pretrained
I went the wrong way and tried quantizing using official bitsandbytes branch https://github.com/huggingface/diffusers/pull/9213 it is bugged and has wrong layer shapes after saving
For civitai models you can do this after you download the file from civitai:
f = FluxPipeline.from_single_file(
filepath_to_local_file,
scheduler=None,
tokenizer=None,
tokenizer_2=None,
#transformer=None, #only load the transformer
text_encoder=None,
vae=None,
text_encoder_2=None,
torch_dtype=torch.bfloat16,
use_safetensors=True,
)
f.save_pretrained("yournewfluxfolder/"+your_model_name)
This will save only the transformer, making a transformer/ subfolder. Then load it by itself using the transformer subfolder just like you do normally.
transformer: FluxTransformer2DModel = FluxTransformer2DModel.from_pretrained(
"yournewfluxfolder/"+your_model_name,
subfolder="transformer",
torch_dtype=torch.bfloat16,
load_in_4bit=True,
)
model.transformer=transformer
I found the results are way better if you can run the non-quantized t5_xxl model (text_encoder_2). I was able to do this with my second 10g GPU. Only running the HighCWu with 4bit is still enough to look just as good as the full version in almost all cases. Also only takes like 20 seconds to generate a 1024x1024 image on my RTX 3080. Even the full version of t5_xxl is severely limited. I don't expect much from it (especially for NSFW). Until someone trains a better t5 for this we are stuck with it. I hear its really hard to train certain content and I am pretty sure it's because of the t5.
Another thing I was able to do is img2img using this as a refiner for SDXL models. The latents won't convert but the PIL image will. You can do SDXL for n steps then do 4-8 steps with flux for the final image. I had to use the small HighCWu version of t5 for this though, because img2img takes up more memory than my 3080 can handle. Thing is you don't really need it as much anyway since you're mostly going off the pregenerated image.
from diffusers import FluxImg2ImgPipeline
flux_img2img = FluxImg2ImgPipeline.from_pretrained(
"black-forest-labs/FLUX.1-schnell",
text_encoder_2=text_encoder_2_small, #use HighCWu's text_encoder_2 for this
transformer=transformer, #whatever 4bit transformer you are using
torch_dtype=torch.bfloat16,
use_safetensors=True,
)
@HighCWu @megachad I tried loading the text_encoder_2 (T5EncoderModel) using this piece of code.
text_encoder_2 = T5EncoderModel.from_pretrained(
"HighCWu/FLUX.1-dev-4bit",
subfolder="text_encoder_2",
torch_dtype=torch.bfloat16
)
I get this error while running it in kaggle. This is related to HQQ quant config I guess, I tried writing the config and then passing it to the T5EncoderModel class. Didn't Work!!!
I will share the error here.
UnboundLocalError Traceback (most recent call last)
Cell In[7], line 25
12 transformer_nf4 = FluxTransformer2DModel.from_pretrained(
13 bfl_repo,
14 subfolder="transformer",
15 quantization_config=nf4_config,
16 torch_dtype=torch.bfloat16
17 )
19 # text_encoder_2_fp8 = T5EncoderModel.from_pretrained(
20 # bfl_repo,
21 # subfolder="text_encoder_2",
22 # torch_dtype=torch.bfloat16
23 # )
---> 25 text_encoder_2 = T5EncoderModel.from_pretrained(
26 "HighCWu/FLUX.1-dev-4bit",
27 subfolder="text_encoder_2",
28 torch_dtype=torch.bfloat16
29 )
File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:3647, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, weights_only, *model_args, **kwargs)
3645 if pre_quantized or quantization_config is not None:
3646 if pre_quantized:
-> 3647 config.quantization_config = AutoHfQuantizer.merge_quantization_configs(
3648 config.quantization_config, quantization_config
3649 )
3650 else:
3651 config.quantization_config = quantization_config
File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/auto.py:173, in AutoHfQuantizer.merge_quantization_configs(cls, quantization_config, quantization_config_from_args)
170 warning_msg = ""
172 if isinstance(quantization_config, dict):
--> 173 quantization_config = AutoQuantizationConfig.from_dict(quantization_config)
175 if (
176 isinstance(quantization_config, (GPTQConfig, AwqConfig, FbgemmFp8Config))
177 and quantization_config_from_args is not None
178 ):
179 # special case for GPTQ / AWQ / FbgemmFp8 config collision
180 loading_attr_dict = quantization_config_from_args.get_loading_attributes()
File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/auto.py:103, in AutoQuantizationConfig.from_dict(cls, quantization_config_dict)
97 raise ValueError(
98 f"Unknown quantization type, got {quant_method} - supported types are:"
99 f" {list(AUTO_QUANTIZER_MAPPING.keys())}"
100 )
102 target_cls = AUTO_QUANTIZATION_CONFIG_MAPPING[quant_method]
--> 103 return target_cls.from_dict(quantization_config_dict)
File /opt/conda/lib/python3.10/site-packages/transformers/utils/quantization_config.py:269, in HqqConfig.from_dict(cls, config)
264 @classmethod
265 def from_dict(cls, config: Dict[str, Any]):
266 """
267 Override from_dict, used in AutoQuantizationConfig.from_dict in quantizers/auto.py
268 """
--> 269 instance = cls()
270 instance.quant_config = config["quant_config"]
271 instance.skip_modules = config["skip_modules"]
File /opt/conda/lib/python3.10/site-packages/transformers/utils/quantization_config.py:244, in HqqConfig.__init__(self, nbits, group_size, view_as_float, axis, dynamic_config, skip_modules, **kwargs)
242 self.quant_config[key] = HQQBaseQuantizeConfig(**dynamic_config[key])
243 else:
--> 244 self.quant_config = HQQBaseQuantizeConfig(
245 **{
246 "nbits": nbits,
247 "group_size": group_size,
248 "view_as_float": view_as_float,
249 "axis": axis,
250 }
251 )
253 self.quant_method = QuantizationMethod.HQQ
254 self.skip_modules = skip_modules
UnboundLocalError: local variable 'HQQBaseQuantizeConfig' referenced before assignment
Please guide me here. Thanks!!
You have to import highwu's model.py from github...
from model import T5EncoderModel, FluxTransformer2DModel
not run
never
on colab t4
I did not try the compression codes, I only tried the activation code and it did not work
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
on colab t4
?????????????????