Loading the CLIPModel (or CLIPVisionModel) that matches this checkpoint

by danaarad - opened Jan 12, 2023

Jan 12, 2023

Hi, Im having trouble finding a CLIPModel checkpoint (or just the CLIPVisionModel) that matches the CLIPTextModel used in this version. The open-clip library provides a different interface (not using the CLIPVisionModel class). Other CLIPModel checkpoint do not match the projection dim of this version (1024, while other checkpoints here are 768 or 512). Does anyone has a solution or can refer me to the correct checkpoint?
Thanks!

lioo

Mar 8, 2023

hello, do you get the clip vision model finally? I tested the laion/CLIP-ViT-H-14-laion2B-s32B-b79K model, but the result and parameters are not the same. I'm confused, which clip model is used for SD2/2-1?

WuYW

Sep 23, 2023

Hi, do you figure it out?

danaarad

Sep 26, 2023

Hi, didn't figure it out, ended up using a previous SD version. If anyone has any input please share!

DaijobuAI

Mar 1, 2024

Also interested in this !

SnowflakeWang

Mar 19, 2024

I have tested laion/CLIP-ViT-H-14-laion2B-s32B-b79K, too. But it seems not to work well. I found that the output of the CLIPTextModel in SD2.1 is 1024. But the output of the CLIPVisionModel in laion/CLIP-ViT-H-14-laion2B-s32B-b79K is 1280. They are not compatible. Then I used CLIPVisionModelWithProjection in laion/CLIP-ViT-H-14-laion2B-s32B-b79K. After that, Their dimensions are matched. But I do not confirm whether I was correct.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment