Zero-Shot Image Classification
TiC-CLIP
vision

How to Use this Model for Zero-Shot Image Classification?

#2
by eclipticwonder - opened

Hi,

How to use this model for Zero-Shot Image Classification? Can you provide a sample code?

Hi,
Thanks for your interest. Here is an example for loading and evaluating the model:

import open_clip
from huggingface_hub import hf_hub_download
filename = hf_hub_download(repo_id="apple/TiC-CLIP-bestpool-cumulative", filename="checkpoints/2016.pt")
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-16', filename)

tokenizer = open_clip.get_tokenizer('ViT-B-16')

image = preprocess(Image.open("image.png").convert('RGB')).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)
fartashf changed discussion status to closed

Please note that these models are released to facilitate research on continual learning. Please refer to the model card for more code examples.

Sign up or log in to comment