update readme for instructions for usage with infinity

#39
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -5622,6 +5622,18 @@ scores = (embeddings[:2] @ embeddings[2:].T) * 100
5622
  print(scores.tolist())
5623
  ```
5624
 
 
 
 
 
 
 
 
 
 
 
 
 
5625
  ## Evaluation
5626
 
5627
  ### MTEB & C-MTEB
 
5622
  print(scores.tolist())
5623
  ```
5624
 
5625
+ ## Infinity_emb
5626
+
5627
+ Usage via [infinity](https://github.com/michaelfeil/infinity), a MIT Licensed inference server.
5628
+
5629
+ ```
5630
+ # requires ~16-32GB VRAM NVIDIA Compute Capability >= 8.0
5631
+ docker run \
5632
+ -v $PWD/data:/app/.cache --gpus "0" -p "7997":"7997" \
5633
+ michaelf34/infinity:0.0.68-trt-onnx \
5634
+ v2 --model-id Alibaba-NLP/gte-Qwen2-7B-instruct --revision "refs/pr/38" --dtype bfloat16 --batch-size 8 --device cuda --engine torch --port 7997 --no-bettertransformer
5635
+ ```
5636
+
5637
  ## Evaluation
5638
 
5639
  ### MTEB & C-MTEB