Edit model card

MobileViTv2 + DeepLabv3 (shehan97/mobilevitv2-1.0-voc-deeplabv3)

MobileViTv2 model pre-trained on PASCAL VOC at resolution 512x512. It was introduced in Separable Self-attention for Mobile Vision Transformers by Sachin Mehta and Mohammad Rastegari, and first released in this repository. The license used is Apple sample code license.

Disclaimer: The team releasing MobileViT did not write a model card for this model so this model card has been written by the Hugging Face team.

Model Description

MobileViTv2 is constructed by replacing the multi-headed self-attention in MobileViT with separable self-attention.

The model in this repo adds a DeepLabV3 head to the MobileViT backbone for semantic segmentation.

Intended uses & limitations

You can use the raw model for semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you.

How to use

Here is how to use this model:

from transformers import MobileViTv2FeatureExtractor, MobileViTv2ForSemanticSegmentation
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = MobileViTv2FeatureExtractor.from_pretrained("shehan97/mobilevitv2-1.0-voc-deeplabv3")
model = MobileViTv2ForSemanticSegmentation.from_pretrained("shehan97/mobilevitv2-1.0-voc-deeplabv3")

inputs = feature_extractor(images=image, return_tensors="pt")

outputs = model(**inputs)
logits = outputs.logits

predicted_mask = logits.argmax(1).squeeze(0)

Currently, both the feature extractor and model support PyTorch.

Training data

The MobileViT + DeepLabV3 model was pretrained on ImageNet-1k, a dataset consisting of 1 million images and 1,000 classes, and then fine-tuned on the PASCAL VOC2012 dataset.

BibTeX entry and citation info

@inproceedings{vision-transformer,
title = {Separable Self-attention for Mobile Vision Transformers},
author = {Sachin Mehta and Mohammad Rastegari},
year = {2022},
URL = {https://arxiv.org/abs/2206.02680}
}
Downloads last month
45
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.