SanghyukChun commited on
Commit
be6e736
1 Parent(s): e5c6184

Push model using huggingface_hub.

Browse files
Files changed (3) hide show
  1. README.md +6 -50
  2. config.json +18 -0
  3. model.safetensors +3 -0
README.md CHANGED
@@ -1,53 +1,9 @@
1
  ---
2
- license: mit
 
 
3
  ---
4
 
5
- ### Official implementation of PCME++ pre-trained model on CC3M, CC12M and RedCaps.
6
-
7
- Zero-shot ImageNet-1k top-1 accuracy: 41.812% (with longer training iterations than the previous version)
8
-
9
- - Paper: https://openreview.net/forum?id=ft1mr3WlGM
10
- - GitHub: https://github.com/naver-ai/pcmepp
11
- - Check the official version with ImageNet-1k top-1 accuracy 34.642% (mean-only ZS classification) at [SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps](https://huggingface.co/SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps)
12
-
13
-
14
- ```python
15
- import requests
16
- from PIL import Image
17
-
18
- import torch
19
- from transformers import CLIPProcessor
20
-
21
- # Check hf_models code here: https://github.com/naver-ai/pcmepp/tree/main/hf_models
22
- from hf_models import HfPCMEPPModel, tokenize
23
-
24
-
25
- processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16")
26
- # IN-top1: 34.64%
27
- # model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps")
28
- # IN-top1: 41.81%
29
- model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps-256M")
30
-
31
-
32
- url = "http://images.cocodataset.org/val2017/000000039769.jpg"
33
- image = Image.open(requests.get(url, stream=True).raw)
34
- inputs = processor(images=image, return_tensors="pt", padding=True)
35
- texts = ["a photo of a cat", "a photo of a dog"]
36
- texts = tokenize(texts)
37
-
38
- outputs = model(images=inputs["pixel_values"], texts=texts)
39
- print("Logits:", outputs["image_features"] @ outputs["text_features"].T)
40
- print("Image uncertainty: ", torch.exp(outputs["image_stds"]).mean(dim=-1))
41
- print("Text uncertainty: ", torch.exp(outputs["text_stds"]).mean(dim=-1))
42
- ```
43
-
44
- ```
45
- @inproceedings{
46
- chun2024pcmepp,
47
- title={Improved Probabilistic Image-Text Representations},
48
- author={Sanghyuk Chun},
49
- booktitle={The Twelfth International Conference on Learning Representations},
50
- year={2024},
51
- url={https://openreview.net/forum?id=ft1mr3WlGM}
52
- }
53
- ```
 
1
  ---
2
+ tags:
3
+ - pytorch_model_hub_mixin
4
+ - model_hub_mixin
5
  ---
6
 
7
+ This model has been pushed to the Hub using ****:
8
+ - Repo: [More Information Needed]
9
+ - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "embed_dim": 512,
3
+ "text_cfg": {
4
+ "context_length": 77,
5
+ "heads": 8,
6
+ "layers": 12,
7
+ "unc_layers": 2,
8
+ "vocab_size": 49408,
9
+ "width": 512
10
+ },
11
+ "vision_cfg": {
12
+ "image_size": 224,
13
+ "layers": 12,
14
+ "patch_size": 16,
15
+ "unc_layers": 2,
16
+ "width": 768
17
+ }
18
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:751c23b80a2b269a7b7c235636f8523b0ca4f5ca29dc0b725fd5b991787a5e10
3
+ size 683066852