gmastrapas commited on
Commit
44077eb
1 Parent(s): 25ca911

docs: minor README fixes

Browse files
Files changed (1) hide show
  1. README.md +11 -7
README.md CHANGED
@@ -148,17 +148,17 @@ Multimodal embeddings enable searching and understanding data across different m
148
  Built upon [`jina-clip-v1`](https://huggingface.co/jinaai/jina-clip-v1) and our recently released [`jina-embeddings-v3`](https://huggingface.co/jinaai/jina-embeddings-v3), `jina-clip-v2` features several significant improvements:
149
 
150
  * **Improved Performance**: v2 shows a 3% performance improvement over v1 in both text-image and text-text retrieval tasks. Similar to v1, v2's text encoder can serve as an effective multilingual long-context dense retriever. It performs on par with our frontier model `jina-embeddings-v3` (currently the best multilingual embeddings under 1B parameters on MTEB).
151
- * **Multilingual Support**: Powered by `jina-embeddings-v3` as the text tower, `jina-clip-v2` supports 89 languages for multilingual-image retrieval, showing up to 4% improvement compared to `nllb-clip-large-siglip` on multilingual image retrieval tasks.
152
  * **Higher Image Resolution**: v2 now supports 512x512 input image resolution, a significant increase from v1's 224x224. This higher resolution enables better processing of detailed images, improved feature extraction, and more accurate recognition of fine-grained visual elements.
153
  * **Matryoshka Representations**: v2 allows users to truncate the output dimensions of both text and image embeddings from 1024 down to 64, reducing storage and processing overhead while maintaining strong performance.
154
 
155
  Measuring 0.9B parameters, `jina-clip-v2` combines two powerful encoders:
156
- * the text encoder `jina-XLM-RoBERTa` (the backbone of `jina-embeddings-v3`) and
157
  * the vision encoder `EVA02-L14` (an efficient vision Transformer developed by BAAI).
158
 
159
  | FEATURE | TEXT ENCODER | IMAGE ENCODER |
160
  |-----------------------|-------------------------|------------------|
161
- | Base Model | Jina XLM-RoBERTa | EVA02-L |
162
  | Parameters | 561M | 304M |
163
  | Input Specification | 8,192 tokens (max) | 512×512 pixels |
164
  | Min Output Dimensions | 64 | 64 |
@@ -330,12 +330,16 @@ sentences = [
330
  image_urls = ['https://i.ibb.co/nQNGqL0/beach1.jpg', 'https://i.ibb.co/r5w8hG8/beach2.jpg']
331
 
332
  # Encode text and images
333
- text_embeddings = model.encode(sentences)
334
- image_embeddings = model.encode(image_urls) # also accepts PIL.Image.Image, local filenames, dataURI
 
 
335
 
336
  # Encode query text
337
  query = 'beautiful sunset over the beach' # English
338
- query_embeddings = model.encode(query, prompt_name='retrieval.query')
 
 
339
  ```
340
  </details>
341
 
@@ -388,7 +392,7 @@ _, _, text_embeddings, image_embeddings = output
388
 
389
  ## License
390
 
391
- `jina-clip-v2` is listed on AWS & Azure. If you need to use it beyond those platforms or on-premises within your company, note that the models is licensed under CC BY-NC 4.0. For commercial usage inquiries, feel free to [contact us](https://jina.ai/contact-sales/).
392
 
393
 
394
  ## Contact
 
148
  Built upon [`jina-clip-v1`](https://huggingface.co/jinaai/jina-clip-v1) and our recently released [`jina-embeddings-v3`](https://huggingface.co/jinaai/jina-embeddings-v3), `jina-clip-v2` features several significant improvements:
149
 
150
  * **Improved Performance**: v2 shows a 3% performance improvement over v1 in both text-image and text-text retrieval tasks. Similar to v1, v2's text encoder can serve as an effective multilingual long-context dense retriever. It performs on par with our frontier model `jina-embeddings-v3` (currently the best multilingual embeddings under 1B parameters on MTEB).
151
+ * **Multilingual Support**: Using the same backbone as `jina-embeddings-v3` for the text tower, `jina-clip-v2` supports 89 languages for multilingual-image retrieval, showing up to 4% improvement compared to `nllb-clip-large-siglip` on multilingual image retrieval tasks.
152
  * **Higher Image Resolution**: v2 now supports 512x512 input image resolution, a significant increase from v1's 224x224. This higher resolution enables better processing of detailed images, improved feature extraction, and more accurate recognition of fine-grained visual elements.
153
  * **Matryoshka Representations**: v2 allows users to truncate the output dimensions of both text and image embeddings from 1024 down to 64, reducing storage and processing overhead while maintaining strong performance.
154
 
155
  Measuring 0.9B parameters, `jina-clip-v2` combines two powerful encoders:
156
+ * the text encoder `Jina-XLM-RoBERTa` (the backbone of `jina-embeddings-v3`) and
157
  * the vision encoder `EVA02-L14` (an efficient vision Transformer developed by BAAI).
158
 
159
  | FEATURE | TEXT ENCODER | IMAGE ENCODER |
160
  |-----------------------|-------------------------|------------------|
161
+ | Base Model | Jina-XLM-RoBERTa | EVA02-L |
162
  | Parameters | 561M | 304M |
163
  | Input Specification | 8,192 tokens (max) | 512×512 pixels |
164
  | Min Output Dimensions | 64 | 64 |
 
330
  image_urls = ['https://i.ibb.co/nQNGqL0/beach1.jpg', 'https://i.ibb.co/r5w8hG8/beach2.jpg']
331
 
332
  # Encode text and images
333
+ text_embeddings = model.encode(sentences, normalize_embeddings=True)
334
+ image_embeddings = model.encode(
335
+ image_urls, normalize_embeddings=True
336
+ ) # also accepts PIL.Image.Image, local filenames, dataURI
337
 
338
  # Encode query text
339
  query = 'beautiful sunset over the beach' # English
340
+ query_embeddings = model.encode(
341
+ query, prompt_name='retrieval.query', normalize_embeddings=True
342
+ )
343
  ```
344
  </details>
345
 
 
392
 
393
  ## License
394
 
395
+ This model is licensed to download and run under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en). It is available for commercial use via the [Jina Embeddings API](https://jina.ai/embeddings/), [AWS](https://aws.amazon.com/marketplace/pp/prodview-bfbctuqmky676), [Azure](https://azuremarketplace.microsoft.com/en-gb/marketplace/apps/jinaai.jina-clip-v2-vm?tab=Overview), and [GCP](https://console.cloud.google.com/marketplace/browse?hl=en&inv=1&invt=AbiFWQ&q=jina). To download for commercial use, please [contact us](https://jina.ai/contact-sales).
396
 
397
 
398
  ## Contact