timm
/

Image Classification
timm
PyTorch
Safetensors

Model card for nextvit_base.bd_ssld_6m_in1k_384

A Next-ViT image classification model. Trained by paper authors on an unknown 6M sample dataset and ImageNet-1k using SSLD distillation.

Model Details

  • Model Type: Image classification / feature backbone
  • Model Stats:
    • Params (M): 44.8
    • GMACs: 24.2
    • Activations (M): 66.0
    • Image size: 384 x 384
  • Pretrain Dataset: Unknown-6M
  • Dataset: ImageNet-1k
  • Papers:
  • Original: https://github.com/bytedance/Next-ViT

Model Usage

Image Classification

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('nextvit_base.bd_ssld_6m_in1k_384', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

Feature Map Extraction

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'nextvit_base.bd_ssld_6m_in1k_384',
    pretrained=True,
    features_only=True,
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

for o in output:
    # print shape of each feature map in output
    # e.g.:
    #  torch.Size([1, 96, 96, 96])
    #  torch.Size([1, 256, 48, 48])
    #  torch.Size([1, 512, 24, 24])
    #  torch.Size([1, 1024, 12, 12])

    print(o.shape)

Image Embeddings

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'nextvit_base.bd_ssld_6m_in1k_384',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 1024, 12, 12) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

Model Comparison

By Top-1

model top1 top1_err top5 top5_err param_count
nextvit_large.bd_ssld_6m_in1k_384 86.542 13.458 98.142 1.858 57.87
nextvit_base.bd_ssld_6m_in1k_384 86.352 13.648 98.04 1.96 44.82
nextvit_small.bd_ssld_6m_in1k_384 85.964 14.036 97.908 2.092 31.76
nextvit_large.bd_ssld_6m_in1k 85.48 14.52 97.696 2.304 57.87
nextvit_base.bd_ssld_6m_in1k 85.186 14.814 97.59 2.41 44.82
nextvit_large.bd_in1k_384 84.924 15.076 97.294 2.706 57.87
nextvit_small.bd_ssld_6m_in1k 84.862 15.138 97.382 2.618 31.76
nextvit_base.bd_in1k_384 84.706 15.294 97.224 2.776 44.82
nextvit_small.bd_in1k_384 84.022 15.978 96.99 3.01 31.76
nextvit_large.bd_in1k 83.626 16.374 96.694 3.306 57.87
nextvit_base.bd_in1k 83.472 16.528 96.656 3.344 44.82
nextvit_small.bd_in1k 82.61 17.39 96.226 3.774 31.76

Citation

@article{li2022next,
  title={Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios},
  author={Li, Jiashi and Xia, Xin and Li, Wei and Li, Huixia and Wang, Xing and Xiao, Xuefeng and Wang, Rui and Zheng, Min and Pan, Xin},
  journal={arXiv preprint arXiv:2207.05501},
  year={2022}
}
Downloads last month
50
Safetensors
Model size
44.9M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train timm/nextvit_base.bd_ssld_6m_in1k_384