AssertionError: You do not have CLIP state dict!

#2
by PixelClassisist - opened

I get the following error when trying to use this in Forge. Your text Detail improved HiT model works fine though. Any ideas?

Could you specify what you mean by "this" - which model exactly is not working for you? Make sure you use the version that worked with HiT; e.g. if you used the Text Encoder only that has TE-only in the filename for the HiT, then also try the TE-only version of ['this' model you were referring to].

Could you specify what you mean by "this" - which model exactly is not working for you? Make sure you use the version that worked with HiT; e.g. if you used the Text Encoder only that has TE-only in the filename for the HiT, then also try the TE-only version of ['this' model you were referring to].

Thanks for the reply. I'm referring to "Long-ViT-L-14-BEST-GmP-smooth-ft.safetensors". Currently I'm using "ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF" and this one works fine, however, I often use very long prompts, so I thought the Long version might be better suited. In the files and versions tabs of "Long-ViT-L-14-BEST-GmP-smooth-ft.safetensors" I can't see a TE-only option. Am I missing something perhaps?

Oh, I am sorry about my confusion! /o
I just clicked this in "inbox" and failed to see we're discussing Long-CLIP, not "normal CLIP". Sorry about that!

You need to adjust (expand) the embeddings and "inject" the Long-CLIP model for that to work.
https://github.com/SeaArtLab/ComfyUI-Long-CLIP did so for SD, SDXL - while I contributed the Flux node via a pull request.

Unfortunately, I don't use Forge (or much inference at all; my art became tweaking the model itself, not so much generating images, haha!). But I hope the details for ComfyUI will serve as guidance for what you'd need to implement with Forge. Or to request the implementation into Forge with the authors of Forge / the community.

Hope that helps / is a starting point, at least!

Sign up or log in to comment