SanghyukChun
/

PCMEPP-ViT-B-16-CC3M-12M-RedCaps

pytorch_model_hub_mixin

model_hub_mixin

Inference Endpoints

Model card Files Files and versions Community

PCMEPP-ViT-B-16-CC3M-12M-RedCaps / README.md

SanghyukChun's picture

Update README.md

1d74372 verified 6 months ago

|

No virus

1.55 kB

	---
	tags:
	- pytorch_model_hub_mixin
	- model_hub_mixin
	---

	### Official implementation of PCME++ pre-trained model on CC3M, CC12M and RedCaps.

	Zero-shot ImageNet-1k top-1 accuracy: 34.642% (slightly better than the paper score, 34.22%)

	- Paper: https://openreview.net/forum?id=ft1mr3WlGM
	- GitHub: https://github.com/naver-ai/pcmepp

	```python
	import requests
	from PIL import Image

	import torch
	from transformers import CLIPProcessor

	# Check hf_models code here: https://github.com/naver-ai/pcmepp/tree/main/hf_models
	from hf_models import HfPCMEPPModel, tokenize


	processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16")
	model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps")


	url = "http://images.cocodataset.org/val2017/000000039769.jpg"
	image = Image.open(requests.get(url, stream=True).raw)
	inputs = processor(images=image, return_tensors="pt", padding=True)
	texts = ["a photo of a cat", "a photo of a dog"]
	texts = tokenize(texts)

	outputs = model(images=inputs["pixel_values"], texts=texts)
	print("Logits:", outputs["image_features"] @ outputs["text_features"].T)
	print("Image uncertainty: ", torch.exp(outputs["image_stds"]).mean(dim=-1))
	print("Text uncertainty: ", torch.exp(outputs["text_stds"]).mean(dim=-1))
	```

	```
	@inproceedings{
	chun2024pcmepp,
	title={Improved Probabilistic Image-Text Representations},
	author={Sanghyuk Chun},
	booktitle={The Twelfth International Conference on Learning Representations},
	year={2024},
	url={https://openreview.net/forum?id=ft1mr3WlGM}
	}
	```