Model weights

#1
by rbareja - opened

Hello,

I submitted a request to download model weights. Could you please update on the request.

thanks,
Rohan

Thanks for your interests, Rohan. We should have granted the access. Please help confirm.

yes, I have acces. Thank you!

Hi Siqi ,

I am using this vision transformer code to create a vit_huge.
https://github.com/facebookresearch/dino/blob/main/vision_transformer.py

def vit_huge(patch_size=16, **kwargs):
model = VisionTransformer(
patch_size=patch_size, embed_dim=1280, depth=32, num_heads=16, mlp_ratio=5.3375,
qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)
return model

But I am getting size mismatch"size mismatch for blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([1280, 3416]) from checkpoint, the shap
e in current model is torch.Size([1280, 6832])." I am passing in patch size=14 only.

Could you tell what could be wrong in the architecture?

Thanks,
Rohan

Paige AI org

Hi Rohan,

It looks like the DINOv1 code you're using does not use a SwiGLU MLP layer which is why you see the shape mismatch error.

We have not validated this checkpoint on other ViT code bases and it's highly recommended to use the timm library for loading this checkpoint, i.e.:

model = timm.create_model("hf-hub:paige-ai/Virchow", pretrained=True, mlp_layer=timm.layers.SwiGLUPacked, act_layer=torch.nn.SiLU)

(full example: https://huggingface.co/paige-ai/Virchow#image-embeddings)

If timm doesn't work for your use case please let us know and we can try to help get the checkpoint working with other ViT code bases!

Hi Adam,

Yes, getting a checkpoint compatible with other vit code bases and similar to checkpoints will be helpful, as I am extracting features from couple of pathology models and would be good to be consistent.

In the case of the DINOv1 code base you'll have to swap out the Mlp layer class with an appropriate SwiGLU implementation. Something like this may work (although I haven't validated it):

class SwiGluMlp(nn.Module):
    def __init__(
            self,
            in_features,
            hidden_features=None,
            out_features=None,
            act_layer=nn.SiLU,
            drop=0.,
            bias=True,
            gate_last=False,
    ):
        super().__init__()
        out_features = out_features or in_features
        hidden_features = hidden_features or in_features
        assert hidden_features % 2 == 0

        self.chunk_dim = -1
        self.gate_last = gate_last

        self.fc1 = nn.Linear(in_features, hidden_features, bias=bias)
        self.act = act_layer()
        self.fc2 = nn.Linear(hidden_features // 2, out_features, bias=bias)

    def forward(self, x):
        x = self.fc1(x)
        x1, x2 = x.chunk(2, dim=self.chunk_dim)
        x = x1 * self.act(x2) if self.gate_last else self.act(x1) * x2
        x = self.fc2(x)
        return x

Please do use the timm implementation as a ground truth to verify the correctness of the model outputs if this works, there may be other differences besides just this layer.

adamcasson changed discussion status to closed

Sign up or log in to comment