michael-guenther commited on
Commit
ccb5985
1 Parent(s): 8f6a4d8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -1
README.md CHANGED
@@ -3187,7 +3187,24 @@ embeddings = F.normalize(embeddings, p=2, dim=1)
3187
  </p>
3188
  </details>
3189
 
3190
- You can use Jina Embedding models directly from transformers package:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3191
  ```python
3192
  !pip install transformers
3193
  from transformers import AutoModel
@@ -3208,6 +3225,28 @@ embeddings = model.encode(
3208
  )
3209
  ```
3210
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3211
  ## Alternatives to Using Transformers Package
3212
 
3213
  1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
@@ -3227,6 +3266,17 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
3227
 
3228
  <img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
3229
 
 
 
 
 
 
 
 
 
 
 
 
3230
  ## Contact
3231
 
3232
  Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
 
3187
  </p>
3188
  </details>
3189
 
3190
+ You can use Jina Embedding models directly from transformers package.
3191
+
3192
+ First, you need to make sure that you are logged into huggingface. You can either use the huggingface-cli tool (after installing the `transformers` package) and pass your [hugginface access token](https://huggingface.co/docs/hub/security-tokens):
3193
+ ```bash
3194
+ huggingface-cli login
3195
+ ```
3196
+ Alternatively, you can provide the access token as an environment variable in the shell:
3197
+ ```bash
3198
+ export HF_TOKEN="<your token here>"
3199
+ ```
3200
+ or in Python:
3201
+ ```python
3202
+ import os
3203
+
3204
+ os.environ['HF_TOKEN'] = "<your token here>"
3205
+ ```
3206
+
3207
+ Then, you can use load and use the model via the `AutoModel` class:
3208
  ```python
3209
  !pip install transformers
3210
  from transformers import AutoModel
 
3225
  )
3226
  ```
3227
 
3228
+ Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well):
3229
+
3230
+ ```python
3231
+ !pip install -U sentence-transformers
3232
+ from sentence_transformers import SentenceTransformer
3233
+ from sentence_transformers.util import cos_sim
3234
+
3235
+ model = SentenceTransformer(
3236
+ "jinaai/jina-embeddings-v2-base-de", # switch to en/zh for English or Chinese
3237
+ trust_remote_code=True
3238
+ )
3239
+
3240
+ # control your input sequence length up to 8192
3241
+ model.max_seq_length = 1024
3242
+
3243
+ embeddings = model.encode([
3244
+ 'How is the weather today?',
3245
+ 'Wie ist das Wetter heute?'
3246
+ ])
3247
+ print(cos_sim(embeddings[0], embeddings[1]))
3248
+ ```
3249
+
3250
  ## Alternatives to Using Transformers Package
3251
 
3252
  1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
 
3266
 
3267
  <img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
3268
 
3269
+ ## Trouble Shooting
3270
+
3271
+ **Loading of Model Code failed**
3272
+
3273
+ If you forgot to pass the `trust_remote_code=True` flag when calling `AutoModel.from_pretrained` or initializing the model via the `SentenceTransformer` class, you will receive an error that the model weights could not be initialized.
3274
+ This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model:
3275
+
3276
+ ```bash
3277
+ Some weights of the model checkpoint at jinaai/jina-embeddings-v2-base-en were not used when initializing BertModel: ['encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.5.mlp.wo.bias', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.8.mlp.layernorm.bias', ...
3278
+ ```
3279
+
3280
  ## Contact
3281
 
3282
  Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.