how to use swin v2?
Hi, I'm a student from China.
I want to use swin v2 as backbone to extract features. But I don't know how to adapt the model to my project.
First, the input size is torch.size((1, 5, 224, 224)), but the pretrained swin v2 see only support torch.size((3, 224,224)).
Second, I want to extract the feature, in other words, the output size from swin v2 is torch.size((1,1024,20,20)).
How do I fix the two questions above?
Thanks in advance.
Hi,
To use Swinv2, you can take a look at the available models: https://huggingface.co/models?other=swinv2.
This particular checkpoint is Swin v1.
As for your first question, you might need to update the initial projection layer, however note that this will require you to train this layer from scratch. It's advised to use as many pre-trained weights as possible.
As for your second question, we're currently adding support for the AutoBackbone API, which allows you to easily extract feature maps from vision backbones, like Swin and ConvNext. For now, it's advised you run a forward pass with output_hidden_states=True
to get the intermediate features.