|
--- |
|
tags: |
|
- pytorch_model_hub_mixin |
|
- model_hub_mixin |
|
license: mit |
|
--- |
|
|
|
### Official implementation of PCME++ pre-trained model on CC3M, CC12M and RedCaps. |
|
|
|
Zero-shot ImageNet-1k top-1 accuracy: 34.642% (slightly better than the paper score, 34.22%) |
|
|
|
- Paper: https://openreview.net/forum?id=ft1mr3WlGM |
|
- GitHub: https://github.com/naver-ai/pcmepp |
|
- Check a better version with ImageNet-1k top-1 accuracy 41.812% (mean-only ZS classification) at [SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps-256M](https://huggingface.co/SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps-256M) |
|
|
|
|
|
```python |
|
import requests |
|
from PIL import Image |
|
|
|
import torch |
|
from transformers import CLIPProcessor |
|
|
|
# Check hf_models code here: https://github.com/naver-ai/pcmepp/tree/main/hf_models |
|
from hf_models import HfPCMEPPModel, tokenize |
|
|
|
|
|
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16") |
|
# IN-top1: 34.64% |
|
model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps") |
|
# IN-top1: 41.81% |
|
# model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps-256M") |
|
|
|
|
|
url = "http://images.cocodataset.org/val2017/000000039769.jpg" |
|
image = Image.open(requests.get(url, stream=True).raw) |
|
inputs = processor(images=image, return_tensors="pt", padding=True) |
|
texts = ["a photo of a cat", "a photo of a dog"] |
|
texts = tokenize(texts) |
|
|
|
outputs = model(images=inputs["pixel_values"], texts=texts) |
|
print("Logits:", outputs["image_features"] @ outputs["text_features"].T) |
|
print("Image uncertainty: ", torch.exp(outputs["image_stds"]).mean(dim=-1)) |
|
print("Text uncertainty: ", torch.exp(outputs["text_stds"]).mean(dim=-1)) |
|
``` |
|
|
|
``` |
|
@inproceedings{ |
|
chun2024pcmepp, |
|
title={Improved Probabilistic Image-Text Representations}, |
|
author={Sanghyuk Chun}, |
|
booktitle={The Twelfth International Conference on Learning Representations}, |
|
year={2024}, |
|
url={https://openreview.net/forum?id=ft1mr3WlGM} |
|
} |
|
``` |