marqo-ecommerce-embeddings-B / README.md

Update README.md

1f36e4b verified 5 months ago

5.64 kB

	---
	license: apache-2.0
	---

	# Marqo E-commerce Embedding Models
	In this work, we introduce two state-of-the-art embedding models for e-commerce:
	Marqo-Ecommerce-B and Marqo-Ecommerce-L.
	They are over 30% better compared to Amazon Titan Embedding services for e-commerce retrieval tasks.

	Released Content:
	1) Marqo-Ecommerce-B and Marqo-Ecommerce-L embedding models
	2) GoogleShopping-1m and AmazonProducts-3m for evaluation
	3) Evaluation Code

	<img src="performance.png" alt="multi split visual" width="500"/>

	## Models

	\| Embedding Model \| #Params (m) \| Dimension \| HuggingFace \| Download .pt \|
	\|---------------------\| --- \|---------------\|------------------------------------\|-------------------------------------------------------------------------------------------------------------\|
	\| Marqo-Ecommerce-B \| 203 \| 768 \| Marqo/marqo-ecommerce-embeddings-B \| [link](https://marqo-gcl-public.s3.us-west-2.amazonaws.com/marqo-general-ecomm/marqo-ecomm-embeddings-b.pt) \|
	\| Marqo-Ecommerce-L \| 652 \| 1024 \| Marqo/marqo-ecommerce-embeddings-L \| [link](https://marqo-gcl-public.s3.us-west-2.amazonaws.com/marqo-general-ecomm/marqo-ecomm-embeddings-l.pt) \|

	### HuggingFace with OpenCLIP
	```
	pip install open_clip_torch
	```
	```python
	from PIL import Image
	import open_clip
	import requests
	import torch

	# Specify model from Hugging Face Hub
	model_name = 'hf-hub:Marqo/marqo-ecommerce-embeddings-B'
	model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms(model_name)
	tokenizer = open_clip.get_tokenizer(model_name)

	# Preprocess the image and tokenize text inputs
	# Load an example image from a URL
	img = Image.open(requests.get('https://raw.githubusercontent.com/marqo-ai/marqo-FashionCLIP/main/docs/fashion-hippo.png', stream=True).raw)
	image = preprocess_val(img).unsqueeze(0)
	text = tokenizer(["a hat", "a t-shirt", "shoes"])

	# Perform inference
	with torch.no_grad(), torch.cuda.amp.autocast():
	image_features = model.encode_image(image, normalize=True)
	text_features = model.encode_text(text, normalize=True)

	# Calculate similarity probabilities
	text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

	# Display the label probabilities
	print("Label probs:", text_probs)
	# [9.9955e-01, 4.4712e-04, 4.4010e-06]]
	```
	### HuggingFace with transformers
	```python
	from transformers import AutoModel, AutoProcessor
	import torch
	from PIL import Image
	import requests
	# model_name= 'Marqo/marqo-ecommerce-embeddings-L'
	model_name = 'Marqo/marqo-ecommerce-embeddings-B'

	model_1 = AutoModel.from_pretrained(model_name, trust_remote_code=True)
	processor_1 = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)

	img = Image.open(requests.get('https://raw.githubusercontent.com/marqo-ai/marqo-FashionCLIP/main/docs/fashion-hippo.png', stream=True).raw).convert("RGB")
	image_1 = [img]
	text_1 = ["a hat", "a t-shirt", "shoes"]
	processed_1 = processor_1(text=text_1, images=image_1, padding='max_length', return_tensors="pt")
	processor_1.image_processor.do_rescale = False
	with torch.no_grad():
	image_features_1 = model_1.get_image_features(processed_1['pixel_values'], normalize=True)
	text_features_1 = model_1.get_text_features(processed_1['input_ids'], normalize=True)

	text_probs_1 = (100 * image_features_1 @ text_features_1.T).softmax(dim=-1)

	print(text_probs_1)
	# [9.9955e-01, 4.4712e-04, 4.4010e-06]]
	```

	### Evaluation with GCL
	```
	git clone https://github.com/marqo-ai/GCL
	```
	Install the packages required by GCL.
	```
	cd ./GCL
	MODEL=hf-hub:Marqo/marqo-ecommerce-B
	outdir=/MarqoModels/GE/marqo-ecommerce-B/gs-title2image2
	hfdataset=Marqo/google-shopping-general-eval
	python evals/eval_hf_datasets_v1.py \
	--model_name $MODEL \
	--hf-dataset $hfdataset \
	--output-dir $outdir \
	--batch-size 1024 \
	--num_workers 8 \
	--left-key "['title']" \
	--right-key "['image']" \
	--img-or-txt "[['txt'], ['img']]" \
	--left-weight "[1]" \
	--right-weight "[1]" \
	--run-queries-cpu \
	--top-q 4000 \
	--doc-id-key item_ID \
	--context-length "[[64], [0]]"
	```


	## Detailed Performance
	GoogleShopping-Text2Image Retrieval.

	\| Embedding Model \| mAP \| P@10 \| R@10 \| MRR \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| Marqo-Ecommerce-L \| 0.682 \| 0.089 \| 0.878 \| 0.683 \|
	\| Marqo-Ecommerce-B \| 0.623 \| 0.084 \| 0.832 \| 0.624 \|
	\| Amazon-Titan-MultiModal \| 0.475 \| 0.065 \| 0.648 \| 0.475 \|
	\| ViT-B-16-SigLip \| 0.476 \| 0.067 \| 0.660 \| 0.477 \|
	\| ViT-L-16-SigLip \| 0.540 \| 0.073 \| 0.722 \| 0.540 \|


	GoogleShopping-Category2Image Retrieval.

	\| Embedding Model \| mAP \| P@10 \| MRR \| nDCG@10 \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| Marqo-Ecommerce-L \| 0.463 \| 0.652 \| 0.822 \| 0.666 \|
	\| Marqo-Ecommerce-B \| 0.423 \| 0.629 \| 0.810 \| 0.644 \|
	\| Amazon-Titan-MultiModal \| 0.246 \| 0.429 \| 0.642 \| 0.446 \|
	\| ViT-B-16-SigLip \| 0.277 \| 0.458 \| 0.660 \| 0.473 \|
	\| ViT-L-16-SigLip \| 0.324 \| 0.497 \| 0.687 \| 0.509 \|

	AmazonProducts-Text2Image Retrieval.

	\| Embedding Model \| mAP \| P@10 \| R@10 \| MRR \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| Marqo-Ecommerce-L \| 0.658 \| 0.096 \| 0.854 \| 0.663 \|
	\| Marqo-Ecommerce-B \| 0.592 \| 0.089 \| 0.795 \| 0.597 \|
	\| Amazon-Titan-MultiModal \| 0.456 \| 0.064 \| 0.627 \| 0.457 \|
	\| ViT-B-16-SigLip \| 0.480 \| 0.070 \| 0.650 \| 0.484 \|
	\| ViT-L-16-SigLip \| 0.544 \| 0.077 \| 0.715 \| 0.548 \|