onnx-cross-encoders / README.md
svilupp's picture
Update README.md
776fda4 verified
---
license: apache-2.0
datasets:
- microsoft/ms_marco
language:
- en
pipeline_tag: text-classification
tags:
- onnx
- cross-encoder
---
# Cross-Encoder for MS Marco - ONNX
ONNX versions of [Sentence Transformers Cross Encoders](https://huggingface.co/cross-encoder) to allow ranking without heavy dependencies.
The models were trained on the [MS Marco Passage Ranking](https://github.com/microsoft/MSMARCO-Passage-Ranking) task.
The models can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See [SBERT.net Retrieve & Re-rank](https://www.sbert.net/examples/applications/retrieve_rerank/README.html) for more details.
## Models Available
| Model Name | Precision | File Name | File Size |
|--------------------------------------|-----------|------------------------------------------|-----------|
| ms-marco-MiniLM-L-4-v2 ONNX | FP32 | ms-marco-MiniLM-L-4-v2-onnx.zip | 70 MB |
| ms-marco-MiniLM-L-4-v2 ONNX (Quantized) | INT8 | ms-marco-MiniLM-L-4-v2-onnx-int8.zip | 12.8 MB |
| ms-marco-MiniLM-L-6-v2 ONNX | FP32 | ms-marco-MiniLM-L-6-v2-onnx.zip | 83.4 MB |
| ms-marco-MiniLM-L-6-v2 ONNX (Quantized) | INT8 | ms-marco-MiniLM-L-6-v2-onnx-int8.zip | 15.2 MB |
## Usage with ONNX Runtime
```python
import onnxruntime as ort
from transformers import AutoTokenizer
model_path="ms-marco-MiniLM-L-4-v2-onnx/"
tokenizer = AutoTokenizer.from_pretrained('model_path')
ort_sess = ort.InferenceSession(model_path + "ms-marco-MiniLM-L-4-v2.onnx")
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'], padding=True, truncation=True, return_tensors="np")
ort_outs = ort_sess.run(None, features)
print(ort_outs)
```
## Performance
TBU...