Update README
Browse files
README.md
CHANGED
@@ -4,6 +4,7 @@ tags:
|
|
4 |
- feature-extraction
|
5 |
- sentence-similarity
|
6 |
- mteb
|
|
|
7 |
license: apache-2.0
|
8 |
language:
|
9 |
- en
|
@@ -1073,6 +1074,9 @@ model-index:
|
|
1073 |
<b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
|
1074 |
</p>
|
1075 |
|
|
|
|
|
|
|
1076 |
|
1077 |
## Intended Usage & Model Info
|
1078 |
|
@@ -1088,13 +1092,17 @@ Additionally, we provide the following embedding models:
|
|
1088 |
|
1089 |
- [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters.
|
1090 |
- [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
|
1091 |
-
- [`jina-embeddings-v2-base-zh`](): Chinese-English Bilingual embeddings
|
1092 |
-
- [`jina-embeddings-v2-base-de`](): German-English Bilingual embeddings
|
1093 |
-
- [`jina-embeddings-v2-base-es`](): Spanish-English Bilingual embeddings (soon).
|
|
|
|
|
|
|
1094 |
|
1095 |
## Data & Parameters
|
1096 |
|
1097 |
-
|
|
|
1098 |
|
1099 |
## Usage
|
1100 |
|
@@ -1157,9 +1165,23 @@ embeddings = model.encode(
|
|
1157 |
)
|
1158 |
```
|
1159 |
|
1160 |
-
|
1161 |
|
1162 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1163 |
|
1164 |
## Use Jina Embeddings for RAG
|
1165 |
|
@@ -1170,12 +1192,6 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
|
|
1170 |
<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
|
1171 |
|
1172 |
|
1173 |
-
## Plans
|
1174 |
-
|
1175 |
-
1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
|
1176 |
-
2. Multimodal embedding models enable Multimodal RAG applications.
|
1177 |
-
3. High-performt rerankers.
|
1178 |
-
|
1179 |
## Contact
|
1180 |
|
1181 |
Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
|
|
|
4 |
- feature-extraction
|
5 |
- sentence-similarity
|
6 |
- mteb
|
7 |
+
inference: false
|
8 |
license: apache-2.0
|
9 |
language:
|
10 |
- en
|
|
|
1074 |
<b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
|
1075 |
</p>
|
1076 |
|
1077 |
+
## Quick Start
|
1078 |
+
|
1079 |
+
The easiest way to starting using `jina-embeddings-v2-base-de` is to use Jina AI's [Embedding API](https://jina.ai/embeddings/).
|
1080 |
|
1081 |
## Intended Usage & Model Info
|
1082 |
|
|
|
1092 |
|
1093 |
- [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters.
|
1094 |
- [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
|
1095 |
+
- [`jina-embeddings-v2-base-zh`](https://huggingface.co/jinaai/jina-embeddings-v2-base-zh): 161 million parameters Chinese-English Bilingual embeddings. **(you are here)**
|
1096 |
+
- [`jina-embeddings-v2-base-de`](https://huggingface.co/jinaai/jina-embeddings-v2-base-de): 161 million parameters German-English Bilingual embeddings.
|
1097 |
+
- _[`jina-embeddings-v2-base-es`](): Spanish-English Bilingual embeddings (soon)._
|
1098 |
+
- _Bilingual embedding models in other world languages (soon)._
|
1099 |
+
- _Multimodal-input embedding model (soon)._
|
1100 |
+
- _High-performing reranking model (soon)._
|
1101 |
|
1102 |
## Data & Parameters
|
1103 |
|
1104 |
+
We will publish a report with technical details about the training of the bilingual models soon.
|
1105 |
+
The training of the English model is described in this [technical report](https://arxiv.org/abs/2310.19923).
|
1106 |
|
1107 |
## Usage
|
1108 |
|
|
|
1165 |
)
|
1166 |
```
|
1167 |
|
1168 |
+
If you want to use the model together with the [sentence-transformers package](https://github.com/UKPLab/sentence-transformers/), make sure that you have installed the latest release and set `trust_remote_code=True` as well:
|
1169 |
|
1170 |
+
```
|
1171 |
+
!pip install -U sentence-transformers
|
1172 |
+
from sentence_transformers import SentenceTransformer
|
1173 |
+
from numpy.linalg import norm
|
1174 |
+
|
1175 |
+
cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
|
1176 |
+
model = SentenceTransformer('jinaai/jina-embeddings-v2-base-de', trust_remote_code=True)
|
1177 |
+
embeddings = model.encode(['How is the weather today?', 'Wie ist das Wetter heute?'])
|
1178 |
+
print(cos_sim(embeddings[0], embeddings[1]))
|
1179 |
+
```
|
1180 |
+
|
1181 |
+
## Alternatives to Using Transformers Package
|
1182 |
+
|
1183 |
+
1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
|
1184 |
+
2. _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on [AWS Sagemaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy).
|
1185 |
|
1186 |
## Use Jina Embeddings for RAG
|
1187 |
|
|
|
1192 |
<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
|
1193 |
|
1194 |
|
|
|
|
|
|
|
|
|
|
|
|
|
1195 |
## Contact
|
1196 |
|
1197 |
Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
|