OpenCLIP
PyTorch
clip
apf1 commited on
Commit
bd1bf88
1 Parent(s): 3e167ad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -5
README.md CHANGED
@@ -1,5 +1,109 @@
1
- ---
2
- license: other
3
- license_name: apple-sample-code-license
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: apple-sample-code-license
4
+ license_link: LICENSE
5
+ ---
6
+
7
+ A CLIP (Contrastive Language-Image Pre-training) model trained on DFN-2B.
8
+ Data Filtering Networks (DFNs) are small networks used to automatically filter large pools of uncurated data.
9
+ This model was trained on 2B images that were filtered from a pool of 12.8B uncurated image-text pairs
10
+ (12.8B image-text pairs from CommonPool-12.8B).
11
+
12
+ This model has been converted to PyTorch from the original JAX checkpoints from Axlearn (https://github.com/apple/axlearn).
13
+ These weights are directly usable in OpenCLIP (image + text).
14
+
15
+
16
+ ## Model Details
17
+
18
+ - **Model Type:** Contrastive Image-Text, Zero-Shot Image Classification.
19
+ - **Dataset:** DFN-2b
20
+ - **Papers:**
21
+ - Data Filtering Networks: https://arxiv.org/abs/2309.17425
22
+ - **Examples Seen:** 39B
23
+
24
+
25
+ ## Model Metrics
26
+ | Eval Dataset | Metric |
27
+ |:-----------------------|---------:|
28
+ | ImageNet 1k | 0.8219 |
29
+ | Caltech-101 | 0.9500 |
30
+ | CIFAR-10 | 0.9864 |
31
+ | CIFAR-100 | 0.8934 |
32
+ | CLEVR Counts | 0.3403 |
33
+ | CLEVR Distance | 0.2321 |
34
+ | Country211 | 0.3198 |
35
+ | Describable Textures | 0.6681 |
36
+ | EuroSAT | 0.6819 |
37
+ | FGVC Aircraft | 0.4829 |
38
+ | Food-101 | 0.9498 |
39
+ | GTSRB | 0.6329 |
40
+ | ImageNet Sketch | 0.7043 |
41
+ | ImageNet v2 | 0.7570 |
42
+ | ImageNet-A | 0.6745 |
43
+ | ImageNet-O | 0.3605 |
44
+ | ImageNet-R | 0.9184 |
45
+ | KITTI Vehicle Distance | 0.2391 |
46
+ | MNIST | 0.8745 |
47
+ | ObjectNet | 0.7477 |
48
+ | Oxford Flowers-102 | 0.8784 |
49
+ | Oxford-IIIT Pet | 0.9611 |
50
+ | Pascal VOC 2007 | 0.8472 |
51
+ | PatchCamelyon | 0.6418 |
52
+ | Rendered SST2 | 0.5815 |
53
+ | RESISC45 | 0.7300 |
54
+ | Stanford Cars | 0.9465 |
55
+ | STL-10 | 0.9889 |
56
+ | SUN397 | 0.7594 |
57
+ | SVHN | 0.6573 |
58
+ | Flickr | 0.8467 |
59
+ | MSCOCO | 0.5957 |
60
+ | WinoGAViL | 0.5551 |
61
+ | iWildCam | 0.1857 |
62
+ | Camelyon17 | 0.6540 |
63
+ | FMoW | 0.1824 |
64
+ | Dollar Street | 0.6822 |
65
+ | GeoDE | 0.9253 |
66
+ | **Average** | **0.68039** |
67
+
68
+ ## Model Usage
69
+ ### With OpenCLIP
70
+ ```
71
+ import torch
72
+ import torch.nn.functional as F
73
+ from urllib.request import urlopen
74
+ from PIL import Image
75
+ from open_clip import create_model_from_pretrained, get_tokenizer
76
+
77
+ model, preprocess = create_model_from_pretrained('hf-hub:apple/DFN2B-CLIP-ViT-L-14')
78
+ tokenizer = get_tokenizer('ViT-L-14')
79
+
80
+ image = Image.open(urlopen(
81
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
82
+ ))
83
+ image = preprocess(image).unsqueeze(0)
84
+
85
+ labels_list = ["a dog", "a cat", "a donut", "a beignet"]
86
+ text = tokenizer(labels_list, context_length=model.context_length)
87
+
88
+ with torch.no_grad(), torch.cuda.amp.autocast():
89
+ image_features = model.encode_image(image)
90
+ text_features = model.encode_text(text)
91
+ image_features = F.normalize(image_features, dim=-1)
92
+ text_features = F.normalize(text_features, dim=-1)
93
+
94
+ text_probs = torch.sigmoid(image_features @ text_features.T * model.logit_scale.exp() + model.logit_bias)
95
+
96
+ zipped_list = list(zip(labels_list, [round(p.item(), 3) for p in text_probs[0]]))
97
+ print("Label probabilities: ", zipped_list)
98
+ ```
99
+
100
+ ## Citation
101
+ ```bibtex
102
+ @article{fang2023data,
103
+ title={Data Filtering Networks},
104
+ author={Fang, Alex and Jose, Albin Madappally and Jain, Amit and Schmidt, Ludwig and Toshev, Alexander and Shankar, Vaishaal},
105
+ journal={arXiv preprint arXiv:2309.17425},
106
+ year={2023}
107
+ }
108
+
109
+ ```