BGE-Renranker-Large

This is an int8 converted version of bge-reranker-large. Thanks to c2translate this should be at least 3 times faster than the original hf transformer version while its smaller with minimal performance loss.

Model Details

Different from embedding model bge-large-en-v1.5, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. Besides this is highly optimized version using c2translate library suitable for production environments.

Model Sources

The original model is based on BAAI BGE-Reranker model. Please visit bge-reranker-orignal-repo for more details.

Usage

Simply pip install ctranslate2 and then

import ctranslate2
import transformers
import torch

device_mapping="cuda" if torch.cuda.is_available() else "cpu"

model_dir = "hooman650/ct2fast-bge-reranker"

# ctranslate2 encoder heavy lifting
encoder = ctranslate2.Encoder(model_dir, device = device_mapping)

# the classification head comes from HF
model_name = "BAAI/bge-reranker-large"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
classifier = transformers.AutoModelForSequenceClassification.from_pretrained(model_name).classifier

classifier.eval()
classifier.to(device_mapping)

pairs = [
    ["I like Ctranslate2","Ctranslate2 makes mid range models faster"],
    ["I like Ctranslate2","Using naive transformers might not be suitable for deployment"]
]
with torch.no_grad():
    tokens = tokenizer(pairs, padding=True, truncation=True, max_length=512).input_ids
    output = encoder.forward_batch(tokens)
    hidden_state = torch.as_tensor(output.last_hidden_state, device=device_mapping)
    logits = classifier(hidden_state).squeeze()

print(logits)

# tensor([ 1.0474, -9.4694], device='cuda:0')

Hardware

Supports both GPU and CPU.