SanghyukChun commited on
Commit
a668fdf
1 Parent(s): be6e736

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -3
README.md CHANGED
@@ -4,6 +4,52 @@ tags:
4
  - model_hub_mixin
5
  ---
6
 
7
- This model has been pushed to the Hub using ****:
8
- - Repo: [More Information Needed]
9
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - model_hub_mixin
5
  ---
6
 
7
+ ### Official implementation of PCME++ pre-trained model on CC3M, CC12M and RedCaps.
8
+
9
+ Zero-shot ImageNet-1k top-1 accuracy: 41.812% (with longer training iterations than the previous version)
10
+
11
+ - Paper: https://openreview.net/forum?id=ft1mr3WlGM
12
+ - GitHub: https://github.com/naver-ai/pcmepp
13
+ - Check the official version with ImageNet-1k top-1 accuracy 34.642% (mean-only ZS classification) at [SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps](https://huggingface.co/SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps)
14
+
15
+
16
+ ```python
17
+ import requests
18
+ from PIL import Image
19
+
20
+ import torch
21
+ from transformers import CLIPProcessor
22
+
23
+ # Check hf_models code here: https://github.com/naver-ai/pcmepp/tree/main/hf_models
24
+ from hf_models import HfPCMEPPModel, tokenize
25
+
26
+
27
+ processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16")
28
+ # IN-top1: 34.64%
29
+ # model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps")
30
+ # IN-top1: 41.81%
31
+ model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps-256M")
32
+
33
+
34
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
35
+ image = Image.open(requests.get(url, stream=True).raw)
36
+ inputs = processor(images=image, return_tensors="pt", padding=True)
37
+ texts = ["a photo of a cat", "a photo of a dog"]
38
+ texts = tokenize(texts)
39
+
40
+ outputs = model(images=inputs["pixel_values"], texts=texts)
41
+ print("Logits:", outputs["image_features"] @ outputs["text_features"].T)
42
+ print("Image uncertainty: ", torch.exp(outputs["image_stds"]).mean(dim=-1))
43
+ print("Text uncertainty: ", torch.exp(outputs["text_stds"]).mean(dim=-1))
44
+ ```
45
+
46
+ ```
47
+ @inproceedings{
48
+ chun2024pcmepp,
49
+ title={Improved Probabilistic Image-Text Representations},
50
+ author={Sanghyuk Chun},
51
+ booktitle={The Twelfth International Conference on Learning Representations},
52
+ year={2024},
53
+ url={https://openreview.net/forum?id=ft1mr3WlGM}
54
+ }
55
+ ```