Input tokens limited
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens:
This happens if your prompt is too long for CLIP. Flux has a second, T5, text encoder that can handle up to 512 tokens, though (only 256 on Schnell). You have to explicitly pass this during the generation call with max_sequence_length=512
Hi, thanks, I'm trying to force my FLUX Dev colab (uses CLIP by default) to use T5. I added the max_sequence to my pipe, but the colab keeps using CLIP ("The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens"), even with:
image = pipe(
prompt=processed_caption,
num_inference_steps=num_inference_steps,
guidance_scale=guidance_scale,
width=width1 if i == 0 else width2,
height=height1 if i == 0 else height2,
generator=generator,
max_sequence_length=512
).images[0]
@QES Both the clip and T5 embeddings are passed to the model, just that T5 supports a longer length. Without seeing your pipeline load statement, I can't say for sure that T5 is being loaded, but it likely is. This error message will still show up but will generate fine, including the additional T5 tokens past 77. Don't try to disable clip, it likely won't work well.
Thanks. I made some tests and realized that: the 77 tokens message keeps showing, BUT the whole prompt is processed (I did put precise details at the end of a long ass prompt ;-)
This is my code in this im unable to pass prompt more than 77 tokens
import torch
from diffusers.utils import load_image
from diffusers.pipelines.flux.pipeline_flux_controlnet import FluxControlNetPipeline
from diffusers.models.controlnet_flux import FluxControlNetModel
from huggingface_hub import login
import logging
from accelerate import infer_auto_device_map
Set up logging
logging.info("Loading models...")
base_model = "black-forest-labs/FLUX.1-dev"
controlnet_model = "promeai/FLUX.1-controlnet-lineart-promeai"
controlnet_model = 'InstantX/FLUX.1-dev-Controlnet-Canny'
controlnet = FluxControlNetModel.from_pretrained(
controlnet_model, torch_dtype=torch.float32
)
pipe = FluxControlNetPipeline.from_pretrained(
base_model, controlnet=controlnet, torch_dtype=torch.float32, device_map="balanced"
)
pipe.to("cpu")
logging.info("Loading control image...")
control_image = load_image("./images/house.jpg")
logging.info("Running inference...")
prompt = "ultra realistic modern residential building at morning on a lively suburban street with nearby tall buildings. Feature a textured concrete facade with warm wooden panels and staggered balconies with glass balustrades lit by green LED lights. Include large, white illuminated transparent windows and an elegant entrance with a garden. Show street with parked cars, detailed asphalt, and ambient street lighting with a sharp background with clouds"
control_net = 0.8 # Strong adherence to the raw sketch
inference = 30 # Reduced for CPU performance
guidance_scale = 6 # Strict adherence to the prompt
seed = 76286282
image_number = 28
torch.manual_seed(seed) # Keep the seed value the same for reproducibility
Set the file name with appended parameters
image_name = f"./image_{image_number}_controlnet-{control_net}_inference-{inference}_guidance-{guidance_scale}.jpg"
height = 568
width = 1024
image = pipe(
prompt,
control_image=control_image,
controlnet_conditioning_scale= control_net,
num_inference_steps= inference,
guidance_scale= guidance_scale,
height = height,
width = width
).images[0]
logging.info("Saving image...")
image.save(image_name)
logging.info("Image saved successfully.")
@anubhav0711 The 77 token limit is only for CLIP text encoder. It won’t really matter since the main text encoder is T5 XXL which can handle 512 tokens.
So it should still work, even with the warning.
Exactly, I did some tests. You get the 77 token limit message, but if you add visual elements at the end of a long-ass prompt, they appear in the image.
so is it mean " 77 token limit message" is just warning and have no effect?
@dieptran it does technically have an effect, but you can safely ignore it, large prompts will still work (up to 512 T5 tokens that is). Just make sure to set max_sequence_length=512 so T5 can read the whole prompt