--- license: apache-2.0 datasets: - microsoft/ms_marco language: - en pipeline_tag: text-classification tags: - onnx - cross-encoder --- # Cross-Encoder for MS Marco - ONNX ONNX versions of [Sentence Transformers Cross Encoders](https://huggingface.co/cross-encoder) to allow ranking without heavy dependencies. The models were trained on the [MS Marco Passage Ranking](https://github.com/microsoft/MSMARCO-Passage-Ranking) task. The models can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See [SBERT.net Retrieve & Re-rank](https://www.sbert.net/examples/applications/retrieve_rerank/README.html) for more details. ## Models Available | Model Name | Precision | File Name | File Size | |--------------------------------------|-----------|------------------------------------------|-----------| | ms-marco-MiniLM-L-4-v2 ONNX | FP32 | ms-marco-MiniLM-L-4-v2-onnx.zip | 70 MB | | ms-marco-MiniLM-L-4-v2 ONNX (Quantized) | INT8 | ms-marco-MiniLM-L-4-v2-onnx-int8.zip | 12.8 MB | | ms-marco-MiniLM-L-6-v2 ONNX | FP32 | ms-marco-MiniLM-L-6-v2-onnx.zip | 83.4 MB | | ms-marco-MiniLM-L-6-v2 ONNX (Quantized) | INT8 | ms-marco-MiniLM-L-6-v2-onnx-int8.zip | 15.2 MB | ## Usage with ONNX Runtime ```python import onnxruntime as ort from transformers import AutoTokenizer model_path="ms-marco-MiniLM-L-4-v2-onnx/" tokenizer = AutoTokenizer.from_pretrained('model_path') ort_sess = ort.InferenceSession(model_path + "ms-marco-MiniLM-L-4-v2.onnx") features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'], padding=True, truncation=True, return_tensors="np") ort_outs = ort_sess.run(None, features) print(ort_outs) ``` ## Performance TBU...