michael-guenther commited on
Commit
79bfe60
1 Parent(s): 0eb77d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -1
README.md CHANGED
@@ -1141,7 +1141,24 @@ embeddings = F.normalize(embeddings, p=2, dim=1)
1141
  </p>
1142
  </details>
1143
 
1144
- You can use Jina Embedding models directly from transformers package:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1145
  ```python
1146
  !pip install transformers
1147
  from transformers import AutoModel
@@ -1175,6 +1192,28 @@ embeddings = model.encode(['How is the weather today?', '今天天气怎么样?'
1175
  print(cos_sim(embeddings[0], embeddings[1]))
1176
  ```
1177
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1178
  ## Alternatives to Using Transformers Package
1179
 
1180
  1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
@@ -1188,6 +1227,16 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
1188
 
1189
  <img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
1190
 
 
 
 
 
 
 
 
 
 
 
1191
 
1192
  ## Contact
1193
 
 
1141
  </p>
1142
  </details>
1143
 
1144
+ You can use Jina Embedding models directly from transformers package.
1145
+
1146
+ First, you need to make sure that you are logged into huggingface. You can either use the huggingface-cli tool (after installing the `transformers` package) and pass your [hugginface access token](https://huggingface.co/docs/hub/security-tokens):
1147
+ ```bash
1148
+ huggingface-cli login
1149
+ ```
1150
+ Alternatively, you can provide the access token as an environment variable in the shell:
1151
+ ```bash
1152
+ export HF_TOKEN="<your token here>"
1153
+ ```
1154
+ or in Python:
1155
+ ```python
1156
+ import os
1157
+
1158
+ os.environ['HF_TOKEN'] = "<your token here>"
1159
+ ```
1160
+
1161
+ Then, you can use load and use the model via the `AutoModel` class:
1162
  ```python
1163
  !pip install transformers
1164
  from transformers import AutoModel
 
1192
  print(cos_sim(embeddings[0], embeddings[1]))
1193
  ```
1194
 
1195
+ Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well):
1196
+
1197
+ ```python
1198
+ !pip install -U sentence-transformers
1199
+ from sentence_transformers import SentenceTransformer
1200
+ from sentence_transformers.util import cos_sim
1201
+
1202
+ model = SentenceTransformer(
1203
+ "jinaai/jina-embeddings-v2-base-de", # switch to en/zh for English or Chinese
1204
+ trust_remote_code=True
1205
+ )
1206
+
1207
+ # control your input sequence length up to 8192
1208
+ model.max_seq_length = 1024
1209
+
1210
+ embeddings = model.encode([
1211
+ 'How is the weather today?',
1212
+ 'Wie ist das Wetter heute?'
1213
+ ])
1214
+ print(cos_sim(embeddings[0], embeddings[1]))
1215
+ ```
1216
+
1217
  ## Alternatives to Using Transformers Package
1218
 
1219
  1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
 
1227
 
1228
  <img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
1229
 
1230
+ ## Trouble Shooting
1231
+
1232
+ **Loading of Model Code failed**
1233
+
1234
+ If you forgot to pass the `trust_remote_code=True` flag when calling `AutoModel.from_pretrained` or initializing the model via the `SentenceTransformer` class, you will receive an error that the model weights could not be initialized.
1235
+ This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model:
1236
+
1237
+ ```bash
1238
+ Some weights of the model checkpoint at jinaai/jina-embeddings-v2-base-en were not used when initializing BertModel: ['encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.5.mlp.wo.bias', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.8.mlp.layernorm.bias', ...
1239
+ ```
1240
 
1241
  ## Contact
1242