File size: 7,852 Bytes
18ed68f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
---
pipeline_tag: sentence-similarity
tags:
- ctranslate2
- int8
- float16
- finetuner
- sentence-transformers
- feature-extraction
- sentence-similarity
datasets:
- jinaai/negation-dataset
language: en
license: apache-2.0
---
# # Fast-Inference with Ctranslate2
Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
quantized version of [jinaai/jina-embedding-t-en-v1](https://huggingface.co/jinaai/jina-embedding-t-en-v1)
```bash
pip install hf-hub-ctranslate2>=2.12.0 ctranslate2>=3.17.1
```
```python
# from transformers import AutoTokenizer
model_name = "michaelfeil/ct2fast-jina-embedding-t-en-v1"
model_name_orig="jinaai/jina-embedding-t-en-v1"
from hf_hub_ctranslate2 import EncoderCT2fromHfHub
model = EncoderCT2fromHfHub(
# load in int8 on CUDA
model_name_or_path=model_name,
device="cuda",
compute_type="int8_float16"
)
outputs = model.generate(
text=["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
max_length=64,
) # perform downstream tasks on outputs
outputs["pooler_output"]
outputs["last_hidden_state"]
outputs["attention_mask"]
# alternative, use SentenceTransformer Mix-In
# for end-to-end Sentence embeddings generation
# (not pulling from this CT2fast-HF repo)
from hf_hub_ctranslate2 import CT2SentenceTransformer
model = CT2SentenceTransformer(
model_name_orig, compute_type="int8_float16", device="cuda"
)
embeddings = model.encode(
["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
batch_size=32,
convert_to_numpy=True,
normalize_embeddings=True,
)
print(embeddings.shape, embeddings)
scores = (embeddings @ embeddings.T) * 100
# Hint: you can also host this code via REST API and
# via github.com/michaelfeil/infinity
```
Checkpoint compatible to [ctranslate2>=3.17.1](https://github.com/OpenNMT/CTranslate2)
and [hf-hub-ctranslate2>=2.12.0](https://github.com/michaelfeil/hf-hub-ctranslate2)
- `compute_type=int8_float16` for `device="cuda"`
- `compute_type=int8` for `device="cpu"`
Converted on 2023-10-13 using
```
LLama-2 -> removed <pad> token.
```
# Licence and other remarks:
This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.
# Original description
<br><br>
<p align="center">
<img src="https://github.com/jina-ai/finetuner/blob/main/docs/_static/finetuner-logo-ani.svg?raw=true" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
</p>
<p align="center">
<b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>, <a href="https://github.com/jina-ai/finetuner"><b>Finetuner</b></a> team.</b>
</p>
## Intented Usage & Model Info
`jina-embedding-t-en-v1` is a tiny language model that has been trained using Jina AI's Linnaeus-Clean dataset.
This dataset consists of 380 million pairs of sentences, which include both query-document pairs.
These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
The Linnaeus-Full dataset, from which the Linnaeus-Clean dataset is derived, originally contained 1.6 billion sentence pairs.
The model has a range of use cases, including information retrieval, semantic textual similarity, text reranking, and more.
With a tiny small parameter size of just 14 million parameters,
the model enables lightning-fast inference on CPU, while still delivering impressive performance.
Additionally, we provide the following options:
- [`jina-embedding-t-en-v1`](https://huggingface.co/jinaai/jina-embedding-t-en-v1): 14 million parameters **(you are here)**.
- [`jina-embedding-s-en-v1`](https://huggingface.co/jinaai/jina-embedding-s-en-v1): 35 million parameters.
- [`jina-embedding-b-en-v1`](https://huggingface.co/jinaai/jina-embedding-b-en-v1): 110 million parameters.
- [`jina-embedding-l-en-v1`](https://huggingface.co/jinaai/jina-embedding-l-en-v1): 330 million parameters.
- `jina-embedding-1b-en-v1`: 1.2 billion parameters, 10 times bert-base (soon).
- `jina-embedding-6b-en-v1`: 6 billion parameters, 30 times bert-base (soon).
## Data & Parameters
Please checkout our [technical blog](https://arxiv.org/abs/2307.11224).
## Metrics
We compared the model against `all-minilm-l6-v2`/`all-mpnet-base-v2` from sbert and `text-embeddings-ada-002` from OpenAI:
|Name|param |dimension|
|------------------------------|-----|------|
|all-minilm-l6-v2|23m |384|
|all-mpnet-base-v2 |110m |768|
|ada-embedding-002|Unknown/OpenAI API |1536|
|jina-embedding-t-en-v1|14m |312|
|jina-embedding-s-en-v1|35m |512|
|jina-embedding-b-en-v1|110m |768|
|jina-embedding-l-en-v1|330m |1024|
|Name|STS12|STS13|STS14|STS15|STS16|STS17|TRECOVID|Quora|SciFact|
|------------------------------|-----|-----|-----|-----|-----|-----|--------|-----|-----|
|all-minilm-l6-v2|0.724|0.806|0.756|0.854|0.79 |0.876|0.473 |0.876|0.645 |
|all-mpnet-base-v2|0.726|**0.835**|0.78 |0.857|0.8 |**0.906**|0.513 |0.875|0.656 |
|ada-embedding-002|0.698|0.833|0.761|0.861|**0.86** |0.903|**0.685** |0.876|**0.726** |
|jina-embedding-t-en-v1|0.717|0.773|0.731|0.829|0.777|0.860|0.482 |0.840|0.522 |
|jina-embedding-s-en-v1|0.743|0.786|0.738|0.837|0.80|0.875|0.523 |0.857|0.524 |
|jina-embedding-b-en-v1|**0.751**|0.809|0.761|0.856|0.812|0.890|0.606 |0.876|0.594 |
|jina-embedding-l-en-v1|0.745|0.832|**0.781**|**0.869**|0.837|0.902|0.573 |**0.881**|0.598 |
## Inference Speed
We encoded a single sentence "What is the current weather like today?" 10k times on:
1. cpu: MacBook Pro 2020, 2 GHz Quad-Core Intel Core i5
2. gpu: 1 Nvidia 3090
And recorded time spent to demonstrate the embedding speed:
|Name|param |dimension| time@cpu | time@gpu |
|------------------------------|-----|------|-----|-----|
|jina-embedding-t-en-v1|14m |312| 5.78s | 2.36s|
|all-minilm-l6-v2|23m |384| 11.95s | 2.70s |
|jina-embedding-s-en-v1|35m |512| 17.25s | 2.81s |
## Usage
Use with Jina AI Finetuner
```python
!pip install finetuner
import finetuner
model = finetuner.build_model('jinaai/jina-embedding-t-en-v1')
embeddings = finetuner.encode(
model=model,
data=['how is the weather today', 'What is the current weather like today?']
)
print(finetuner.cos_sim(embeddings[0], embeddings[1]))
```
Use with sentence-transformers:
```python
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
sentences = ['how is the weather today', 'What is the current weather like today?']
model = SentenceTransformer('jinaai/jina-embedding-t-en-v1')
embeddings = model.encode(sentences)
print(cos_sim(embeddings[0], embeddings[1]))
```
## Fine-tuning
Please consider [Finetuner](https://github.com/jina-ai/finetuner).
## Plans
1. The development of `jina-embedding-s-en-v2` is currently underway with two main objectives: improving performance and increasing the maximum sequence length.
2. We are currently working on a bilingual embedding model that combines English and X language. The upcoming model will be called `jina-embedding-s/b/l-de-v1`.
## Contact
Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
## Citation
If you find Jina Embeddings useful in your research, please cite the following paper:
``` latex
@misc{günther2023jina,
title={Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models},
author={Michael Günther and Louis Milliken and Jonathan Geuter and Georgios Mastrapas and Bo Wang and Han Xiao},
year={2023},
eprint={2307.11224},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
``` |