onnx-cross-encoders / README.md
svilupp's picture
Update README.md
158957f verified
|
raw
history blame
2.21 kB
metadata
license: apache-2.0
datasets:
  - microsoft/ms_marco
language:
  - en
pipeline_tag: text-classification
tags:
  - onnx
  - cross-encoder

Cross-Encoder for MS Marco - ONNX

ONNX versions of Sentence Transformers Cross Encoders.

The models were trained on the MS Marco Passage Ranking task.

The models can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details. The training code is available here: SBERT.net Training MS Marco

Models Available

Model Name Precision File Name File Size
ms-marco-MiniLM-L-4-v2 ONNX FP32 ms-marco-MiniLM-L-4-v2-onnx.zip 70 MB
ms-marco-MiniLM-L-4-v2 ONNX (Quantized) INT8 ms-marco-MiniLM-L-4-v2-onnx-int8.zip 12.8 MB
ms-marco-MiniLM-L-6-v2 ONNX FP32 ms-marco-MiniLM-L-6-v2-onnx.zip 83.4 MB
ms-marco-MiniLM-L-6-v2 ONNX (Quantized) INT8 ms-marco-MiniLM-L-6-v2-onnx-int8.zip 15.2 MB

Usage with ONNX Runtime

import onnxruntime as ort
from transformers import AutoTokenizer

model_path="ms-marco-MiniLM-L-4-v2-onnx/"
tokenizer = AutoTokenizer.from_pretrained('model_path')
ort_sess = ort.InferenceSession(model_path + "ms-marco-MiniLM-L-4-v2.onnx")

features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="np")
ort_outs = ort_sess.run(None, features)
print(ort_outs)

Performance

TBU...