ONNX GPU Runtime with O4 for BAAI/bge-reranker-large
benchmark: https://colab.research.google.com/drive/1HP9GQKdzYa6H9SJnAZoxJWq920gxwd2k
Convert
!optimum-cli export onnx -m BAAI/bge-reranker-large --optimize O4 bge-reranker-large-onnx-o4 --device cuda
Usage
# pip install "optimum[onnxruntime-gpu]" transformers
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('swulling/bge-reranker-large-onnx-o4')
model = ORTModelForSequenceClassification.from_pretrained('swulling/bge-reranker-large-onnx-o4')
model.to("cuda")
pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
with torch.no_grad():
inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
print(scores)
Source model
- Downloads last month
- 495
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.