jupyterjazz commited on
Commit
8ea62b5
1 Parent(s): af01f51

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -18
README.md CHANGED
@@ -66,7 +66,7 @@ language:
66
  - my
67
  - ne
68
  - nl
69
- - 'no'
70
  - om
71
  - or
72
  - pa
@@ -201,37 +201,56 @@ embeddings = F.normalize(embeddings, p=2, dim=1)
201
  </p>
202
  </details>
203
 
204
- 1. The easiest way to starting using jina-clip-v1-en is to use Jina AI's [Embeddings API](https://jina.ai/embeddings/).
205
- 2. Alternatively, you can use Jina CLIP directly via transformers package.
206
 
207
  ```python
208
- !pip install transformers einops flash_attn
209
  from transformers import AutoModel
210
 
211
  # Initialize the model
212
  model = AutoModel.from_pretrained('jinaai/jina-embeddings-v3', trust_remote_code=True)
213
 
214
- # New meaningful sentences
215
- sentences = [
216
- "Organic skincare for sensitive skin with aloe vera and chamomile.",
217
- "New makeup trends focus on bold colors and innovative techniques",
218
- "Bio-Hautpflege für empfindliche Haut mit Aloe Vera und Kamille",
219
- "Neue Make-up-Trends setzen auf kräftige Farben und innovative Techniken",
220
- "Cuidado de la piel orgánico para piel sensible con aloe vera y manzanilla",
221
- "Las nuevas tendencias de maquillaje se centran en colores vivos y técnicas innovadoras",
222
- "针对敏感肌专门设计的天然有机护肤产品",
223
- "新的化妆趋势注重鲜艳的颜色和创新的技巧",
224
- "敏感肌のために特別に設計された天然有機スキンケア製品",
225
- "新しいメイクのトレンドは鮮やかな色と革新的な技術に焦点を当てています",
226
  ]
227
 
228
- # Encode sentences
229
- embeddings = model.encode(sentences, truncate_dim=1024, task_type='index') # TODO UPDATE
 
 
230
 
231
  # Compute similarities
232
  print(embeddings[0] @ embeddings[1].T)
233
  ```
234
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
235
 
236
  ## Performance
237
 
 
66
  - my
67
  - ne
68
  - nl
69
+ - no
70
  - om
71
  - or
72
  - pa
 
201
  </p>
202
  </details>
203
 
204
+ 1. The easiest way to starting using `jina-embeddings-v3` is to use Jina AI's [Embeddings API](https://jina.ai/embeddings/).
205
+ 2. Alternatively, you can use `jina-embeddings-v3` directly via transformers package.
206
 
207
  ```python
208
+ !pip install transformers
209
  from transformers import AutoModel
210
 
211
  # Initialize the model
212
  model = AutoModel.from_pretrained('jinaai/jina-embeddings-v3', trust_remote_code=True)
213
 
214
+ texts = [
215
+ 'Follow the white rabbit.', # English
216
+ 'Sigue al conejo blanco.', # Spanish
217
+ 'Suis le lapin blanc.', # French
218
+ '跟着白兔走。', # Chinese
219
+ 'اتبع الأرنب الأبيض.', # Arabic
220
+ 'Folge dem weißen Kaninchen.' # German
 
 
 
 
 
221
  ]
222
 
223
+ # When calling the `encode` function, you can choose a task_type based on the use case:
224
+ # 'retrieval.query', 'retrieval.passage', 'separation', 'classification', 'text-matching'
225
+ # Alternatively, you can choose not to pass a task_type, and no specific LoRA adapter will be used.
226
+ embeddings = model.encode(texts, task_type='text-matching')
227
 
228
  # Compute similarities
229
  print(embeddings[0] @ embeddings[1].T)
230
  ```
231
 
232
+ By default, the model supports a maximum sequence length of 8192 tokens.
233
+ However, if you want to truncate your input texts to a shorter length, you can pass the `max_length` parameter to the encode function:
234
+ ```python
235
+ embeddings = model.encode(
236
+ ['Very long ... document'],
237
+ max_length=2048
238
+ )
239
+ ```
240
+
241
+ In case you want to use Matryoshka embeddings and switch to a different embedding dimension,
242
+ you can adjust the embedding dimension by passing the `truncate_dim` parameter to the encode function:
243
+ ```python
244
+ embeddings = model.encode(
245
+ ['Sample text'],
246
+ truncate_dim=256
247
+ )
248
+ ```
249
+
250
+
251
+
252
+
253
+
254
 
255
  ## Performance
256