File size: 1,915 Bytes
f83b2bc 4d4cffe f83b2bc 1d74372 a77ea36 1d74372 a77ea36 1d74372 a77ea36 1d74372 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
---
tags:
- pytorch_model_hub_mixin
- model_hub_mixin
license: mit
---
### Official implementation of PCME++ pre-trained model on CC3M, CC12M and RedCaps.
Zero-shot ImageNet-1k top-1 accuracy: 34.642% (slightly better than the paper score, 34.22%)
- Paper: https://openreview.net/forum?id=ft1mr3WlGM
- GitHub: https://github.com/naver-ai/pcmepp
- Check a better version with ImageNet-1k top-1 accuracy 41.812% (mean-only ZS classification) at [SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps-256M](https://huggingface.co/SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps-256M)
```python
import requests
from PIL import Image
import torch
from transformers import CLIPProcessor
# Check hf_models code here: https://github.com/naver-ai/pcmepp/tree/main/hf_models
from hf_models import HfPCMEPPModel, tokenize
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16")
# IN-top1: 34.64%
model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps")
# IN-top1: 41.81%
# model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps-256M")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt", padding=True)
texts = ["a photo of a cat", "a photo of a dog"]
texts = tokenize(texts)
outputs = model(images=inputs["pixel_values"], texts=texts)
print("Logits:", outputs["image_features"] @ outputs["text_features"].T)
print("Image uncertainty: ", torch.exp(outputs["image_stds"]).mean(dim=-1))
print("Text uncertainty: ", torch.exp(outputs["text_stds"]).mean(dim=-1))
```
```
@inproceedings{
chun2024pcmepp,
title={Improved Probabilistic Image-Text Representations},
author={Sanghyuk Chun},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=ft1mr3WlGM}
}
``` |