|
# MS Marco Ranking with ColBERT on Vespa.ai |
|
|
|
Model is based on [ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT](https://arxiv.org/abs/2004.12832). |
|
This BERT model is based on [google/bert_uncased_L-8_H-512_A-8](https://huggingface.co/google/bert_uncased_L-8_H-512_A-8) and trained using the |
|
original [ColBERT training routine](https://github.com/stanford-futuredata/ColBERT/). |
|
The model weights have been tuned by training using the `triples.train.small.tar.gz from` [MSMARCO-Passage-Ranking](https://github.com/microsoft/MSMARCO-Passage-Ranking). |
|
|
|
|
|
To use this model with vespa.ai for MS Marco Passage Ranking, see |
|
[MS Marco Ranking using Vespa.ai sample app](https://github.com/vespa-engine/sample-apps/tree/master/msmarco-ranking). |
|
|
|
# MS Marco Passage Ranking |
|
|
|
| MS Marco Passage Ranking Query Set | MRR@10 ColBERT on Vespa.ai | |
|
|------------------------------------|----------------| |
|
| Dev | 0.354 | |
|
| Eval | 0.347 | |
|
|
|
The official baseline BM25 ranking model MRR@10 0.16 on eval and 0.167 on dev question set. |
|
See [MS Marco Passage Ranking Leaderboard](https://microsoft.github.io/msmarco/). |
|
|
|
## Export ColBERT query encoder to ONNX |
|
We represent the ColBERT query encoder in the Vespa runtime, to map the textual query representation to the tensor representation. For this |
|
we use Vespa's support for running ONNX models. One can use the following snippet to export the model for serving. |
|
|
|
```python |
|
from transformers import BertModel |
|
from transformers import BertPreTrainedModel |
|
from transformers import BertConfig |
|
import torch |
|
import torch.nn as nn |
|
|
|
class VespaColBERT(BertPreTrainedModel): |
|
|
|
def __init__(self,config): |
|
super().__init__(config) |
|
self.bert = BertModel(config) |
|
self.linear = nn.Linear(config.hidden_size, 32, bias=False) |
|
self.init_weights() |
|
|
|
def forward(self, input_ids, attention_mask): |
|
Q = self.bert(input_ids,attention_mask=attention_mask)[0] |
|
Q = self.linear(Q) |
|
return torch.nn.functional.normalize(Q, p=2, dim=2) |
|
|
|
colbert_query_encoder = VespaColBERT.from_pretrained("vespa-engine/colbert-medium") |
|
|
|
#Export model to ONNX for serving in Vespa |
|
|
|
input_names = ["input_ids", "attention_mask"] |
|
output_names = ["contextual"] |
|
#input, max 32 query term |
|
input_ids = torch.ones(1,32, dtype=torch.int64) |
|
attention_mask = torch.ones(1,32,dtype=torch.int64) |
|
args = (input_ids, attention_mask) |
|
torch.onnx.export(colbert_query_encoder, |
|
args=args, |
|
f="query_encoder_colbert.onnx", |
|
input_names = input_names, |
|
output_names = output_names, |
|
dynamic_axes = { |
|
"input_ids": {0: "batch"}, |
|
"attention_mask": {0: "batch"}, |
|
"contextual": {0: "batch"}, |
|
}, |
|
opset_version=11) |
|
``` |
|
|
|
# Representing the model on Vespa.ai |
|
See [Ranking with ONNX models](https://docs.vespa.ai/documentation/onnx.html) and [MS Marco Ranking sample app](https://github.com/vespa-engine/sample-apps/tree/master/msmarco-ranking) |
|
|
|
|