jupyterjazz commited on
Commit
af01f51
1 Parent(s): 3a2c2be

readme: usage and performance

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md CHANGED
@@ -160,6 +160,47 @@ The data and training details are described in the technical report (coming soon
160
 
161
  ## Usage
162
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
  1. The easiest way to starting using jina-clip-v1-en is to use Jina AI's [Embeddings API](https://jina.ai/embeddings/).
164
  2. Alternatively, you can use Jina CLIP directly via transformers package.
165
 
 
160
 
161
  ## Usage
162
 
163
+ **<details><summary>Apply mean pooling when integrating the model.</summary>**
164
+ <p>
165
+
166
+ ### Why Use Mean Pooling?
167
+
168
+ Mean pooling takes all token embeddings from the model's output and averages them at the sentence or paragraph level.
169
+ This approach has been shown to produce high-quality sentence embeddings.
170
+
171
+ We provide an `encode` function that handles this for you automatically.
172
+
173
+ However, if you're working with the model directly, outside of the `encode` function,
174
+ you'll need to apply mean pooling manually. Here's how you can do it:
175
+
176
+
177
+ ```python
178
+ import torch
179
+ import torch.nn.functional as F
180
+ from transformers import AutoTokenizer, AutoModel
181
+
182
+ def mean_pooling(model_output, attention_mask):
183
+ token_embeddings = model_output[0]
184
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
185
+ return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
186
+
187
+ sentences = ['How is the weather today?', 'What is the current weather like today?']
188
+
189
+ tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embeddings-v3')
190
+ model = AutoModel.from_pretrained('jinaai/jina-embeddings-v3', trust_remote_code=True)
191
+
192
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
193
+
194
+ with torch.no_grad():
195
+ model_output = model(**encoded_input)
196
+
197
+ embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
198
+ embeddings = F.normalize(embeddings, p=2, dim=1)
199
+ ```
200
+
201
+ </p>
202
+ </details>
203
+
204
  1. The easiest way to starting using jina-clip-v1-en is to use Jina AI's [Embeddings API](https://jina.ai/embeddings/).
205
  2. Alternatively, you can use Jina CLIP directly via transformers package.
206