liuqi6777 commited on
Commit
9c8f958
1 Parent(s): 5424617

Update README

Browse files
Files changed (1) hide show
  1. README.md +28 -12
README.md CHANGED
@@ -4,6 +4,7 @@ tags:
4
  - feature-extraction
5
  - sentence-similarity
6
  - mteb
 
7
  license: apache-2.0
8
  language:
9
  - en
@@ -1073,6 +1074,9 @@ model-index:
1073
  <b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
1074
  </p>
1075
 
 
 
 
1076
 
1077
  ## Intended Usage & Model Info
1078
 
@@ -1088,13 +1092,17 @@ Additionally, we provide the following embedding models:
1088
 
1089
  - [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters.
1090
  - [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
1091
- - [`jina-embeddings-v2-base-zh`](): Chinese-English Bilingual embeddings (soon) **(you are here)**.
1092
- - [`jina-embeddings-v2-base-de`](): German-English Bilingual embeddings (soon).
1093
- - [`jina-embeddings-v2-base-es`](): Spanish-English Bilingual embeddings (soon).
 
 
 
1094
 
1095
  ## Data & Parameters
1096
 
1097
- Jina Embeddings V2 [technical report](https://arxiv.org/abs/2310.19923)
 
1098
 
1099
  ## Usage
1100
 
@@ -1157,9 +1165,23 @@ embeddings = model.encode(
1157
  )
1158
  ```
1159
 
1160
- ## Fully-managed Embeddings Service
1161
 
1162
- Alternatively, you can use Jina AI's [Embedding platform](https://jina.ai/embeddings/) for fully-managed access to Jina Embeddings models.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1163
 
1164
  ## Use Jina Embeddings for RAG
1165
 
@@ -1170,12 +1192,6 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
1170
  <img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
1171
 
1172
 
1173
- ## Plans
1174
-
1175
- 1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
1176
- 2. Multimodal embedding models enable Multimodal RAG applications.
1177
- 3. High-performt rerankers.
1178
-
1179
  ## Contact
1180
 
1181
  Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
 
4
  - feature-extraction
5
  - sentence-similarity
6
  - mteb
7
+ inference: false
8
  license: apache-2.0
9
  language:
10
  - en
 
1074
  <b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
1075
  </p>
1076
 
1077
+ ## Quick Start
1078
+
1079
+ The easiest way to starting using `jina-embeddings-v2-base-de` is to use Jina AI's [Embedding API](https://jina.ai/embeddings/).
1080
 
1081
  ## Intended Usage & Model Info
1082
 
 
1092
 
1093
  - [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters.
1094
  - [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
1095
+ - [`jina-embeddings-v2-base-zh`](https://huggingface.co/jinaai/jina-embeddings-v2-base-zh): 161 million parameters Chinese-English Bilingual embeddings. **(you are here)**
1096
+ - [`jina-embeddings-v2-base-de`](https://huggingface.co/jinaai/jina-embeddings-v2-base-de): 161 million parameters German-English Bilingual embeddings.
1097
+ - _[`jina-embeddings-v2-base-es`](): Spanish-English Bilingual embeddings (soon)._
1098
+ - _Bilingual embedding models in other world languages (soon)._
1099
+ - _Multimodal-input embedding model (soon)._
1100
+ - _High-performing reranking model (soon)._
1101
 
1102
  ## Data & Parameters
1103
 
1104
+ We will publish a report with technical details about the training of the bilingual models soon.
1105
+ The training of the English model is described in this [technical report](https://arxiv.org/abs/2310.19923).
1106
 
1107
  ## Usage
1108
 
 
1165
  )
1166
  ```
1167
 
1168
+ If you want to use the model together with the [sentence-transformers package](https://github.com/UKPLab/sentence-transformers/), make sure that you have installed the latest release and set `trust_remote_code=True` as well:
1169
 
1170
+ ```
1171
+ !pip install -U sentence-transformers
1172
+ from sentence_transformers import SentenceTransformer
1173
+ from numpy.linalg import norm
1174
+
1175
+ cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
1176
+ model = SentenceTransformer('jinaai/jina-embeddings-v2-base-de', trust_remote_code=True)
1177
+ embeddings = model.encode(['How is the weather today?', 'Wie ist das Wetter heute?'])
1178
+ print(cos_sim(embeddings[0], embeddings[1]))
1179
+ ```
1180
+
1181
+ ## Alternatives to Using Transformers Package
1182
+
1183
+ 1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
1184
+ 2. _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on [AWS Sagemaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy).
1185
 
1186
  ## Use Jina Embeddings for RAG
1187
 
 
1192
  <img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
1193
 
1194
 
 
 
 
 
 
 
1195
  ## Contact
1196
 
1197
  Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.