michaelfeil
/

ct2fast-e5-small-v2

@@ -2608,21 +2608,11 @@ Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on
 quantized version of [intfloat/e5-small-v2](https://huggingface.co/intfloat/e5-small-v2)
 ```bash
-pip install hf-hub-ctranslate2>=2.0.8 ctranslate2>=3.16.0
-```
-Converted on 2023-06-15 using
-```
-ct2-transformers-converter --model intfloat/e5-small-v2 --output_dir ~/tmp-ct2fast-e5-small-v2 --force --copy_files tokenizer.json README.md tokenizer_config.json vocab.txt special_tokens_map.json .gitattributes --quantization float16 --trust_remote_code
 ```
-Checkpoint compatible to [ctranslate2>=3.16.0](https://github.com/OpenNMT/CTranslate2)
-and [hf-hub-ctranslate2>=2.0.8](https://github.com/michaelfeil/hf-hub-ctranslate2)
-- `compute_type=int8_float16` for `device="cuda"`
-- `compute_type=int8`  for `device="cpu"`
 ```python
-from transformers import AutoTokenizer
 model_name = "michaelfeil/ct2fast-e5-small-v2"
 from hf_hub_ctranslate2 import EncoderCT2fromHfHub
@@ -2633,10 +2623,25 @@ model = EncoderCT2fromHfHub(
         compute_type="float16",
         # tokenizer=AutoTokenizer.from_pretrained("{ORG}/{NAME}")
 )
-outputs = model.generate(
-    text=["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
 )
-print(outputs.shape, outputs)
 ```
 # Licence and other remarks:

 quantized version of [intfloat/e5-small-v2](https://huggingface.co/intfloat/e5-small-v2)
 ```bash
+pip install hf-hub-ctranslate2>=2.10.0 ctranslate2>=3.16.0
 ```
 ```python
+# from transformers import AutoTokenizer
 model_name = "michaelfeil/ct2fast-e5-small-v2"
 from hf_hub_ctranslate2 import EncoderCT2fromHfHub
         compute_type="float16",
         # tokenizer=AutoTokenizer.from_pretrained("{ORG}/{NAME}")
 )
+embeddings = model.encode(
+    ["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
+    batch_size=32,
+    convert_to_numpy=True,
+    normalize_embeddings=True,
 )
+print(embeddings.shape, embeddings)
+scores = (embeddings @ embeddings.T) * 100
+```
+Checkpoint compatible to [ctranslate2>=3.16.0](https://github.com/OpenNMT/CTranslate2)
+and [hf-hub-ctranslate2>=2.10.0](https://github.com/michaelfeil/hf-hub-ctranslate2)
+- `compute_type=int8_float16` for `device="cuda"`
+- `compute_type=int8`  for `device="cpu"`
+Converted on 2023-06-16 using
+```
+ct2-transformers-converter --model intfloat/e5-small-v2 --output_dir ~/tmp-ct2fast-e5-small-v2 --force --copy_files tokenizer.json README.md tokenizer_config.json vocab.txt special_tokens_map.json .gitattributes --quantization float16 --trust_remote_code
 ```
 # Licence and other remarks: