elliesleightholm

Xenova HF staff commited on Sep 3, 2024

Commit

ef343cf

verified ·

1 Parent(s): 8385fef

Upload ONNX weights (+ quantizations) + Transformers.js support (#1)

Browse files

- Upload ONNX weights (+ quantizations) (31a9a46f0afbbbe34a1668adcd72a03ce37d5b9b)
- Create config.json (a05d487c0285b3c35898c17db40f374058b42d81)
- Create preprocessor_config.json (8690aa35ac67dacfbc214cdff245602eaa83b5e2)
- Update tokenizer_config.json (cc93df47ec2c676dc83e0d3e92e3f56f57366066)
- Update README.md (4a45991a7a637ae0cf9ff467bcd6db7e166247d7)

Co-authored-by: Joshua <Xenova@users.noreply.huggingface.co>

Files changed (20) hide show

README.md +53 -0
config.json +3 -0
onnx/text_model.onnx +3 -0
onnx/text_model_bnb4.onnx +3 -0
onnx/text_model_fp16.onnx +3 -0
onnx/text_model_int8.onnx +3 -0
onnx/text_model_q4.onnx +3 -0
onnx/text_model_q4f16.onnx +3 -0
onnx/text_model_quantized.onnx +3 -0
onnx/text_model_uint8.onnx +3 -0
onnx/vision_model.onnx +3 -0
onnx/vision_model_bnb4.onnx +3 -0
onnx/vision_model_fp16.onnx +3 -0
onnx/vision_model_int8.onnx +3 -0
onnx/vision_model_q4.onnx +3 -0
onnx/vision_model_q4f16.onnx +3 -0
onnx/vision_model_quantized.onnx +3 -0
onnx/vision_model_uint8.onnx +3 -0
preprocessor_config.json +23 -0
tokenizer_config.json +1 -1

README.md CHANGED Viewed

@@ -5,6 +5,7 @@ tags:
 - fashion
 - multimodal retrieval
 - siglip
 library_name: open_clip
 pipeline_tag: zero-shot-image-classification
 license: apache-2.0
@@ -25,6 +26,9 @@ The model was fine-tuned from ViT-B-16-SigLIP (webli).
 ## Usage
 The model can be seamlessly used with [OpenCLIP](https://github.com/mlfoundations/open_clip) by
 ```python
@@ -49,6 +53,55 @@ with torch.no_grad(), torch.cuda.amp.autocast():
 print("Label probs:", text_probs)
 ```
 ## Benchmark Results
 Average evaluation results on 6 public multimodal fashion datasets ([Atlas](https://huggingface.co/datasets/Marqo/atlas), [DeepFashion (In-shop)](https://huggingface.co/datasets/Marqo/deepfashion-inshop), [DeepFashion (Multimodal)](https://huggingface.co/datasets/Marqo/deepfashion-multimodal), [Fashion200k](https://huggingface.co/datasets/Marqo/fashion200k), [KAGL](https://huggingface.co/datasets/Marqo/KAGL), and [Polyvore](https://huggingface.co/datasets/Marqo/polyvore)) are reported below:

 - fashion
 - multimodal retrieval
 - siglip
+- transformers.js
 library_name: open_clip
 pipeline_tag: zero-shot-image-classification
 license: apache-2.0
 ## Usage
+### OpenCLIP
 The model can be seamlessly used with [OpenCLIP](https://github.com/mlfoundations/open_clip) by
 ```python
 print("Label probs:", text_probs)
 ```
+### Transformers.js
+You can also run the model in JavaScript with the [Transformers.js](https://huggingface.co/docs/transformers.js) library.
+First, install it from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
+```bash
+npm i @huggingface/transformers
+```
+Then, compute embeddings as follows:
+```js
+import { SiglipTextModel, SiglipVisionModel, AutoTokenizer, AutoProcessor, RawImage, softmax, dot } from '@huggingface/transformers';
+const model_id = 'Marqo/marqo-fashionSigLIP';
+// Load tokenizer and text model
+const tokenizer = await AutoTokenizer.from_pretrained(model_id);
+const text_model = await SiglipTextModel.from_pretrained(model_id);
+// Load processor and vision model
+const processor = await AutoProcessor.from_pretrained(model_id);
+const vision_model = await SiglipVisionModel.from_pretrained(model_id);
+// Run tokenization
+const texts = ['a hat', 'a t-shirt', 'shoes'];
+const text_inputs = tokenizer(texts, { padding: 'max_length', truncation: true });
+// Compute text embeddings
+const { text_embeds } = await text_model(text_inputs);
+// Read image and run processor
+const image = await RawImage.read('https://raw.githubusercontent.com/marqo-ai/marqo-FashionCLIP/main/docs/fashion-hippo.png');
+const image_inputs = await processor(image);
+// Compute vision embeddings
+const { image_embeds } = await vision_model(image_inputs);
+// Compute similarity scores
+const normalized_text_embeds = text_embeds.normalize().tolist();
+const normalized_image_embeds = image_embeds.normalize().tolist()[0];
+const text_probs = softmax(normalized_text_embeds.map((text_embed) =>
+    100.0 * dot(normalized_image_embeds, text_embed)
+));
+console.log(text_probs);
+// [0.9860219105287394, 0.00777916527489097, 0.006198924196369721]
+```
 ## Benchmark Results
 Average evaluation results on 6 public multimodal fashion datasets ([Atlas](https://huggingface.co/datasets/Marqo/atlas), [DeepFashion (In-shop)](https://huggingface.co/datasets/Marqo/deepfashion-inshop), [DeepFashion (Multimodal)](https://huggingface.co/datasets/Marqo/deepfashion-multimodal), [Fashion200k](https://huggingface.co/datasets/Marqo/fashion200k), [KAGL](https://huggingface.co/datasets/Marqo/KAGL), and [Polyvore](https://huggingface.co/datasets/Marqo/polyvore)) are reported below:

config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "model_type": "siglip"
+}

onnx/text_model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c1d501b23bddf27ba828c037b0780e44fdf47ca4c0b925ef190ab5bcf7aaf6e6
+size 441361402

onnx/text_model_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4be58194517cdafe1695c2675259ffdba8871b77f8c5828cd45598177453f5d5
+size 173734396

onnx/text_model_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f5fcccd805e8910663dcbd6821c3cbe040bbe508c656964de08736272228806
+size 220817780

onnx/text_model_int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f5a7995e029c6ee9346fa46857661c4a171d28b164fa7703b85a527d73adf170
+size 111125229

onnx/text_model_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1f62fd2b046d82ec1ed91881fe9f4f0b34d4e11bae539a33092712031ffee129
+size 178600156

onnx/text_model_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e48e36d0acdc1579aac8edb3c593a3f2ef9d55f7646b9e3acd7baf8f00d7d0ce
+size 108904023

onnx/text_model_quantized.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f5a7995e029c6ee9346fa46857661c4a171d28b164fa7703b85a527d73adf170
+size 111125229

onnx/text_model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e45a29f61825b0fdc4ab2648c84791d6af1b63fe86ce7bd7c7fee43fc3b1c4d
+size 111125261

onnx/vision_model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a7e773846b27a699c45ba7e3978514b7fca420662d7e69e3b9226982f09f4a13
+size 371715502

onnx/vision_model_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a7724073a1af61260bc4d250492ff1d2bfbf2e4d4e171e90c5044186e2198948
+size 55430656

onnx/vision_model_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a6d3e644416e543c62344c06f7dc8be5687633901a2931faccbe28688e30737e
+size 185947013

onnx/vision_model_int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dae6e934667af50590c3d0ca69c7b14edc71f4df1b4f670e5ba6bc623495b691
+size 93973410

onnx/vision_model_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a864016eff2d8829ec6088b1593c3c9e75a70e0413e0f02d4a4bbfddc3ef89d3
+size 61181030

onnx/vision_model_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:988c55937d35228f860bcc90742940ec3efe8d00054518249f6c82b66e1b4a7c
+size 53686874

onnx/vision_model_quantized.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f1009cfce0eedd409f601e8351eedab72e8529641105eee6517821a9a634a2f4
+size 93973443

onnx/vision_model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f1009cfce0eedd409f601e8351eedab72e8529641105eee6517821a9a634a2f4
+size 93973443

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_processor_type": "SiglipImageProcessor",
+  "image_mean": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "processor_class": "SiglipProcessor",
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "height": 224,
+    "width": 224
+  },
+  "image_std": [
+    0.5,
+    0.5,
+    0.5
+  ]
+}

tokenizer_config.json CHANGED Viewed

@@ -931,7 +931,7 @@
   "eos_token": "</s>",
   "extra_ids": 100,
   "legacy": false,
-  "model_max_length": 1000000000000000019884624838656,
   "pad_token": "</s>",
   "sp_model_kwargs": {},
   "tokenizer_class": "T5Tokenizer",

   "eos_token": "</s>",
   "extra_ids": 100,
   "legacy": false,
+  "model_max_length": 64,
   "pad_token": "</s>",
   "sp_model_kwargs": {},
   "tokenizer_class": "T5Tokenizer",