How to achieve 4-bit quantization?
Can you share the implementation of 4-bit quantization code?
for transformer just use his class with load_in_4bit = true. It will run any flux transformer. No need to do anything else.
import torch
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit = true)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
prompt = "A cat holding a sign that says hello world"
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=3.5,
num_inference_steps=50,
max_sequence_length=512,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-dev.png")
Do you mean that and is that correct?
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit = true)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning:
The secret HF_TOKEN
does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
Keyword arguments {'load_in_4bit': True} are not expected by FluxPipeline and will be ignored.
Loading pipeline components...: 100%
7/7 [00:43<00:00, 3.20s/it]
WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.
WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.
Loading checkpoint shards: 100%
2/2 [00:39<00:00, 19.41s/it]
You set add_prefix_space
. The tokenizer needs to be converted from the slow tokenizers
ValueError Traceback (most recent call last)
in <cell line: 5>()
3
4 pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")
----> 5 pipe.enable_model_cpu_offload()
6 reset_device_map()
7 enable_model_cpu_offload()
/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py in enable_model_cpu_offload(self, gpu_id, device)
1005 is_pipeline_device_mapped = self.hf_device_map is not None and len(self.hf_device_map) > 1
1006 if is_pipeline_device_mapped:
-> 1007 raise ValueError(
1008 "It seems like you have activated a device mapping strategy on the pipeline so calling enable_model_cpu_offload() isn't allowed. You can call
reset_device_map()first and then call
enable_model_cpu_offload()`."
1009 )
ValueError: It seems like you have activated a device mapping strategy on the pipeline so calling enable_model_cpu_offload() isn't allowed. You can call
reset_device_map()first and then call
enable_model_cpu_offload()`.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("black-forest-labs/FLUX.1-dev")
model = AutoModelForCausalLM.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")
ValueError Traceback (most recent call last)
in <cell line: 3>()
1 from transformers import AutoTokenizer, AutoModelForCausalLM
2
----> 3 tokenizer = AutoTokenizer.from_pretrained("black-forest-labs/FLUX.1-dev")
4 model = AutoModelForCausalLM.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")
1 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
1047 return CONFIG_MAPPING[pattern].from_dict(config_dict, **unused_kwargs)
1048
-> 1049 raise ValueError(
1050 f"Unrecognized model in {pretrained_model_name_or_path}. "
1051 f"Should have a model_type
key in its {CONFIG_NAME}, or contain one of the following strings "
ValueError: Unrecognized model in black-forest-labs/FLUX.1-dev. Should have a model_type
key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, git, glm, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, graphormer, grounding-dino, groupvit, hiera, hubert, ibert, idefics, idefics2, idefics3, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, m...
from transformers import GPTNeoForCausalLM, GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("black-forest-labs/FLUX.1-dev")
model = GPTNeoForCausalLM.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")
OSError Traceback (most recent call last)
in <cell line: 3>()
1 from transformers import GPTNeoForCausalLM, GPT2Tokenizer
2
----> 3 tokenizer = GPT2Tokenizer.from_pretrained("black-forest-labs/FLUX.1-dev")
4 model = GPTNeoForCausalLM.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, trust_remote_code, *init_inputs, **kwargs)
2012 # loaded directly from the GGUF file.
2013 if all(full_file_name is None for full_file_name in resolved_vocab_files.values()) and not gguf_file:
-> 2014 raise EnvironmentError(
2015 f"Can't load tokenizer for '{pretrained_model_name_or_path}'. If you were trying to load it from "
2016 "'https://huggingface.co/models', make sure you don't have a local directory with the same name. "
OSError: Can't load tokenizer for 'black-forest-labs/FLUX.1-dev'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'black-forest-labs/FLUX.1-dev' is the correct path to a directory containing all relevant files for a GPT2Tokenizer tokenizer.
Read the model card.... import from model.py from his github... not huggingface
What do you mean can you write a working code because I tried with many changes in Colab T4 and it didn't work
I don't use colab. Here's the github link found on the model card though.... https://github.com/HighCWu/flux-4bit
Your first problem is this
from transformers import GPTNeoForCausalLM, GPT2Tokenizer
This isn't gpt...it is flux. READ THE MODEL CARD.