BGE-Renranker-Large
This is an int8
converted version of bge-reranker-large. Thanks to c2translate
this should
be at least 3 times faster than the original hf transformer version while its smaller with minimal performance loss.
Model Details
Different from embedding model bge-large-en-v1.5
, reranker uses question and document as input and directly output similarity instead of embedding.
You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range.
Besides this is highly optimized version using c2translate
library suitable for production environments.
Model Sources
The original model is based on BAAI
BGE-Reranker
model. Please visit bge-reranker-orignal-repo
for more details.
Usage
Simply pip install ctranslate2
and then
import ctranslate2
import transformers
import torch
device_mapping="cuda" if torch.cuda.is_available() else "cpu"
model_dir = "hooman650/ct2fast-bge-reranker"
# ctranslate2 encoder heavy lifting
encoder = ctranslate2.Encoder(model_dir, device = device_mapping)
# the classification head comes from HF
model_name = "BAAI/bge-reranker-large"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
classifier = transformers.AutoModelForSequenceClassification.from_pretrained(model_name).classifier
classifier.eval()
classifier.to(device_mapping)
pairs = [
["I like Ctranslate2","Ctranslate2 makes mid range models faster"],
["I like Ctranslate2","Using naive transformers might not be suitable for deployment"]
]
with torch.no_grad():
tokens = tokenizer(pairs, padding=True, truncation=True, max_length=512).input_ids
output = encoder.forward_batch(tokens)
hidden_state = torch.as_tensor(output.last_hidden_state, device=device_mapping)
logits = classifier(hidden_state).squeeze()
print(logits)
# tensor([ 1.0474, -9.4694], device='cuda:0')
Hardware
Supports both GPU and CPU.
- Downloads last month
- 5