jinaai
/

jina-clip-v2

Model card Files Files and versions Community

gmastrapas commited on Nov 21, 2024

Commit

44077eb

1 Parent(s): 25ca911

docs: minor README fixes

Browse files

Files changed (1) hide show

README.md +11 -7

README.md CHANGED Viewed

@@ -148,17 +148,17 @@ Multimodal embeddings enable searching and understanding data across different m
 Built upon [`jina-clip-v1`](https://huggingface.co/jinaai/jina-clip-v1) and our recently released [`jina-embeddings-v3`](https://huggingface.co/jinaai/jina-embeddings-v3), `jina-clip-v2` features several significant improvements:
 * **Improved Performance**: v2 shows a 3% performance improvement over v1 in both text-image and text-text retrieval tasks. Similar to v1, v2's text encoder can serve as an effective multilingual long-context dense retriever. It performs on par with our frontier model `jina-embeddings-v3` (currently the best multilingual embeddings under 1B parameters on MTEB).
-* **Multilingual Support**: Powered by `jina-embeddings-v3` as the text tower, `jina-clip-v2` supports 89 languages for multilingual-image retrieval, showing up to 4% improvement compared to `nllb-clip-large-siglip` on multilingual image retrieval tasks.
 * **Higher Image Resolution**: v2 now supports 512x512 input image resolution, a significant increase from v1's 224x224. This higher resolution enables better processing of detailed images, improved feature extraction, and more accurate recognition of fine-grained visual elements.
 * **Matryoshka Representations**: v2 allows users to truncate the output dimensions of both text and image embeddings from 1024 down to 64, reducing storage and processing overhead while maintaining strong performance.
 Measuring 0.9B parameters, `jina-clip-v2` combines two powerful encoders:
-* the text encoder `jina-XLM-RoBERTa` (the backbone of `jina-embeddings-v3`) and
 * the vision encoder `EVA02-L14` (an efficient vision Transformer developed by BAAI).
 | FEATURE               | TEXT ENCODER            | IMAGE ENCODER    |
 |-----------------------|-------------------------|------------------|
-| Base Model	           | Jina XLM-RoBERTa	       | EVA02-L          |
 | Parameters	           | 561M                    | 304M             |
 | Input Specification	  | 8,192 tokens (max)	     | 512×512 pixels   |
 | Min Output Dimensions | 64                      | 64               |
@@ -330,12 +330,16 @@ sentences = [
 image_urls = ['https://i.ibb.co/nQNGqL0/beach1.jpg', 'https://i.ibb.co/r5w8hG8/beach2.jpg']
 # Encode text and images
-text_embeddings = model.encode(sentences)
-image_embeddings = model.encode(image_urls)  # also accepts PIL.Image.Image, local filenames, dataURI
 # Encode query text
 query = 'beautiful sunset over the beach' # English
-query_embeddings = model.encode(query, prompt_name='retrieval.query')
 ```
 </details>
@@ -388,7 +392,7 @@ _, _, text_embeddings, image_embeddings = output
 ## License
-`jina-clip-v2` is listed on AWS & Azure. If you need to use it beyond those platforms or on-premises within your company, note that the models is licensed under CC BY-NC 4.0. For commercial usage inquiries, feel free to [contact us](https://jina.ai/contact-sales/).
 ## Contact

 Built upon [`jina-clip-v1`](https://huggingface.co/jinaai/jina-clip-v1) and our recently released [`jina-embeddings-v3`](https://huggingface.co/jinaai/jina-embeddings-v3), `jina-clip-v2` features several significant improvements:
 * **Improved Performance**: v2 shows a 3% performance improvement over v1 in both text-image and text-text retrieval tasks. Similar to v1, v2's text encoder can serve as an effective multilingual long-context dense retriever. It performs on par with our frontier model `jina-embeddings-v3` (currently the best multilingual embeddings under 1B parameters on MTEB).
+* **Multilingual Support**: Using the same backbone as `jina-embeddings-v3` for the text tower, `jina-clip-v2` supports 89 languages for multilingual-image retrieval, showing up to 4% improvement compared to `nllb-clip-large-siglip` on multilingual image retrieval tasks.
 * **Higher Image Resolution**: v2 now supports 512x512 input image resolution, a significant increase from v1's 224x224. This higher resolution enables better processing of detailed images, improved feature extraction, and more accurate recognition of fine-grained visual elements.
 * **Matryoshka Representations**: v2 allows users to truncate the output dimensions of both text and image embeddings from 1024 down to 64, reducing storage and processing overhead while maintaining strong performance.
 Measuring 0.9B parameters, `jina-clip-v2` combines two powerful encoders:
+* the text encoder `Jina-XLM-RoBERTa` (the backbone of `jina-embeddings-v3`) and
 * the vision encoder `EVA02-L14` (an efficient vision Transformer developed by BAAI).
 | FEATURE               | TEXT ENCODER            | IMAGE ENCODER    |
 |-----------------------|-------------------------|------------------|
+| Base Model	           | Jina-XLM-RoBERTa	       | EVA02-L          |
 | Parameters	           | 561M                    | 304M             |
 | Input Specification	  | 8,192 tokens (max)	     | 512×512 pixels   |
 | Min Output Dimensions | 64                      | 64               |
 image_urls = ['https://i.ibb.co/nQNGqL0/beach1.jpg', 'https://i.ibb.co/r5w8hG8/beach2.jpg']
 # Encode text and images
+text_embeddings = model.encode(sentences, normalize_embeddings=True)
+image_embeddings = model.encode(
+    image_urls, normalize_embeddings=True
+)  # also accepts PIL.Image.Image, local filenames, dataURI
 # Encode query text
 query = 'beautiful sunset over the beach' # English
+query_embeddings = model.encode(
+    query, prompt_name='retrieval.query', normalize_embeddings=True
+)
 ```
 </details>
 ## License
+This model is licensed to download and run under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en). It is available for commercial use via the [Jina Embeddings API](https://jina.ai/embeddings/), [AWS](https://aws.amazon.com/marketplace/pp/prodview-bfbctuqmky676), [Azure](https://azuremarketplace.microsoft.com/en-gb/marketplace/apps/jinaai.jina-clip-v2-vm?tab=Overview), and [GCP](https://console.cloud.google.com/marketplace/browse?hl=en&inv=1&invt=AbiFWQ&q=jina). To download for commercial use, please [contact us](https://jina.ai/contact-sales).
 ## Contact