Edit model card

MobileViTv2 (mobilevitv2-1.0-imagenet1k-256)

MobileViTv2 is the second version of MobileViT. It was proposed in Separable Self-attention for Mobile Vision Transformers by Sachin Mehta and Mohammad Rastegari, and first released in this repository. The license used is Apple sample code license.

Disclaimer: The team releasing MobileViT did not write a model card for this model so this model card has been written by the Hugging Face team.

Model Description

MobileViTv2 is constructed by replacing the multi-headed self-attention in MobileViT with separable self-attention.

Intended uses & limitations

You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you.

How to use

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import MobileViTImageProcessor, MobileViTV2ForImageClassification
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = MobileViTImageProcessor.from_pretrained("shehan97/mobilevitv2-1.0-imagenet1k-256")
model = MobileViTV2ForImageClassification.from_pretrained("shehan97/mobilevitv2-1.0-imagenet1k-256")

inputs = feature_extractor(images=image, return_tensors="pt")

outputs = model(**inputs)
logits = outputs.logits

# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

Currently, both the feature extractor and model support PyTorch.

Training data

The MobileViT model was pretrained on ImageNet-1k, a dataset consisting of 1 million images and 1,000 classes.

BibTeX entry and citation info

@inproceedings{vision-transformer,
title = {Separable Self-attention for Mobile Vision Transformers},
author = {Sachin Mehta and Mohammad Rastegari},
year = {2022},
URL = {https://arxiv.org/abs/2206.02680}
}
Downloads last month
4,725
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train apple/mobilevitv2-1.0-imagenet1k-256