guenthermi
commited on
Commit
•
64cb362
1
Parent(s):
a4ba7b8
update readme
Browse files- README.md +34 -15
- de_evaluation_results.png +0 -0
README.md
CHANGED
@@ -3,7 +3,6 @@ tags:
|
|
3 |
- sentence-transformers
|
4 |
- feature-extraction
|
5 |
- sentence-similarity
|
6 |
-
- mteb
|
7 |
language:
|
8 |
- de
|
9 |
- en
|
@@ -3109,7 +3108,7 @@ model-index:
|
|
3109 |
<br><br>
|
3110 |
|
3111 |
<p align="center">
|
3112 |
-
<img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="
|
3113 |
</p>
|
3114 |
|
3115 |
|
@@ -3117,6 +3116,9 @@ model-index:
|
|
3117 |
<b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
|
3118 |
</p>
|
3119 |
|
|
|
|
|
|
|
3120 |
|
3121 |
## Intended Usage & Model Info
|
3122 |
|
@@ -3135,13 +3137,17 @@ Des Weiteren stellen wir folgende Embedding-Modelle bereit:
|
|
3135 |
|
3136 |
- [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters.
|
3137 |
- [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
|
3138 |
-
- [`jina-embeddings-v2-base-zh`](): Chinese-English Bilingual embeddings
|
3139 |
-
- [`jina-embeddings-v2-base-de`](): German-English Bilingual embeddings
|
3140 |
-
- [`jina-embeddings-v2-base-es`](): Spanish-English Bilingual embeddings (soon).
|
|
|
|
|
|
|
3141 |
|
3142 |
## Data & Parameters
|
3143 |
|
3144 |
-
|
|
|
3145 |
|
3146 |
## Usage
|
3147 |
|
@@ -3204,9 +3210,29 @@ embeddings = model.encode(
|
|
3204 |
)
|
3205 |
```
|
3206 |
|
3207 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3208 |
|
3209 |
-
|
|
|
|
|
3210 |
|
3211 |
## Use Jina Embeddings for RAG
|
3212 |
|
@@ -3216,13 +3242,6 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
|
|
3216 |
|
3217 |
<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
|
3218 |
|
3219 |
-
|
3220 |
-
## Plans
|
3221 |
-
|
3222 |
-
1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
|
3223 |
-
2. Multimodal embedding models enable Multimodal RAG applications.
|
3224 |
-
3. High-performt rerankers.
|
3225 |
-
|
3226 |
## Contact
|
3227 |
|
3228 |
Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
|
|
|
3 |
- sentence-transformers
|
4 |
- feature-extraction
|
5 |
- sentence-similarity
|
|
|
6 |
language:
|
7 |
- de
|
8 |
- en
|
|
|
3108 |
<br><br>
|
3109 |
|
3110 |
<p align="center">
|
3111 |
+
<img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="Jina AI logo: Jina AI is your Portal to Multimodal AI" width="150px">
|
3112 |
</p>
|
3113 |
|
3114 |
|
|
|
3116 |
<b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
|
3117 |
</p>
|
3118 |
|
3119 |
+
## Quick Start
|
3120 |
+
|
3121 |
+
The easiest way to starting using `jina-embeddings-v2-base-de` is to use Jina AI's [Embedding API](https://jina.ai/embeddings/).
|
3122 |
|
3123 |
## Intended Usage & Model Info
|
3124 |
|
|
|
3137 |
|
3138 |
- [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters.
|
3139 |
- [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
|
3140 |
+
- [`jina-embeddings-v2-base-zh`](https://huggingface.co/jinaai/jina-embeddings-v2-base-zh): 161 million parameters Chinese-English Bilingual embeddings.
|
3141 |
+
- [`jina-embeddings-v2-base-de`](https://huggingface.co/jinaai/jina-embeddings-v2-base-de): 161 million parameters German-English Bilingual embeddings **(you are here)**.
|
3142 |
+
- _[`jina-embeddings-v2-base-es`](): Spanish-English Bilingual embeddings (soon)._
|
3143 |
+
- _Bilingual embedding models in other world languages (soon)._
|
3144 |
+
- _Multimodal-input embedding model (soon)._
|
3145 |
+
- _High-performing reranking model (soon)._
|
3146 |
|
3147 |
## Data & Parameters
|
3148 |
|
3149 |
+
We will publish a report with technical details about the training of the bilingual models soon.
|
3150 |
+
The training of the English model is described in this [technical report](https://arxiv.org/abs/2310.19923).
|
3151 |
|
3152 |
## Usage
|
3153 |
|
|
|
3210 |
)
|
3211 |
```
|
3212 |
|
3213 |
+
If you want to use the model together with the [sentence-transformers package](https://github.com/UKPLab/sentence-transformers/), make sure that you have installed the latest release and set `trust_remote_code=True` as well:
|
3214 |
+
|
3215 |
+
```
|
3216 |
+
!pip install -U sentence-transformers
|
3217 |
+
from sentence_transformers import SentenceTransformer
|
3218 |
+
from numpy.linalg import norm
|
3219 |
+
|
3220 |
+
cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
|
3221 |
+
model = SentenceTransformer('jinaai/jina-embeddings-v2-base-de', trust_remote_code=True)
|
3222 |
+
embeddings = model.encode(['How is the weather today?', 'Wie ist das Wetter heute?'])
|
3223 |
+
print(cos_sim(embeddings[0], embeddings[1]))
|
3224 |
+
```
|
3225 |
+
|
3226 |
+
## Alternatives to Using Transformers Package
|
3227 |
+
|
3228 |
+
1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
|
3229 |
+
2. _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on [AWS Sagemaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy).
|
3230 |
+
|
3231 |
+
## Benchmark Results
|
3232 |
|
3233 |
+
We evaluated our Bilingual model on all German and English evaluation tasks availble on the [MTEB benchmark](https://huggingface.co/blog/mteb). In addition, we evaluated the models agains a couple of other German, English, and multilingual models on additional German evaluation tasks:
|
3234 |
+
|
3235 |
+
<img src="de_evaluation_results.png" width="780px">
|
3236 |
|
3237 |
## Use Jina Embeddings for RAG
|
3238 |
|
|
|
3242 |
|
3243 |
<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
|
3244 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3245 |
## Contact
|
3246 |
|
3247 |
Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
|
de_evaluation_results.png
ADDED