File size: 1,975 Bytes
d8353c5
 
 
 
 
 
 
 
 
 
 
9b2cce9
d8353c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c068b1a
 
 
 
 
 
 
 
 
 
 
 
d8353c5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# InRanker-small (60M parameters)

InRanker is a version of monoT5 distilled from [monoT5-3B](https://huggingface.co/castorini/monot5-3b-msmarco-10k) with increased effectiveness on out-of-domain scenarios.
Our key insight were to use language models and rerankers to generate as much as possible
synthetic "in-domain" training data, i.e., data that closely resembles
the data that will be seen at retrieval time. The pipeline used for training consists of
two distillation phases that do not require additional user queries
or manual annotations: (1) training on existing supervised soft
teacher labels, and (2) training on teacher soft labels for synthetic
queries generated using a large language model.

The paper with further details can be found [here](https://arxiv.org/abs/2401.06910). The code and library are available at
https://github.com/unicamp-dl/InRanker

## Usage
The library was tested using python 3.10 and is installed with:
```bash
pip install inranker
```

The code for inference is:
```python
from inranker import T5Ranker

model = T5Ranker(model_name_or_path="unicamp-dl/InRanker-small")

docs = [
    "The capital of France is Paris",
    "Learn deep learning with InRanker and transformers"
]
scores = model.get_scores(
    query="What is the best way to learn deep learning?",
    docs=docs
)
# Scores are sorted in descending order (most relevant to least)
# scores -> [0, 1]
sorted_scores = sorted(zip(scores, docs), key=lambda x: x[0], reverse=True)

""" InRanker-small:
sorted_scores = [
    (0.4844, 'Learn deep learning with InRanker and transformers'),
    (7.83e-06, 'The capital of France is Paris')
]
"""
```

## How to Cite
```
@misc{laitz2024inranker,
      title={InRanker: Distilled Rankers for Zero-shot Information Retrieval}, 
      author={Thiago Laitz and Konstantinos Papakostas and Roberto Lotufo and Rodrigo Nogueira},
      year={2024},
      eprint={2401.06910},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}
```