model-index:
- name: ListConRanker
results:
- dataset:
config: default
name: MTEB CMedQAv1-reranking (default)
revision: null
split: test
type: C-MTEB/CMedQAv1-reranking
metrics:
- type: map
value: 90.55366308098787
- type: mrr_1
value: 87.8
- type: mrr_10
value: 92.45134920634919
- type: mrr_5
value: 92.325
- type: main_score
value: 90.55366308098787
task:
type: Reranking
- dataset:
config: default
name: MTEB CMedQAv2-reranking (default)
revision: null
split: test
type: C-MTEB/CMedQAv2-reranking
metrics:
- type: map
value: 89.38076135722042
- type: mrr_1
value: 85.9
- type: mrr_10
value: 91.28769841269842
- type: mrr_5
value: 91.08999999999999
- type: main_score
value: 89.38076135722042
task:
type: Reranking
- dataset:
config: default
name: MTEB MMarcoReranking (default)
revision: null
split: dev
type: C-MTEB/Mmarco-reranking
metrics:
- type: map
value: 43.881461866703894
- type: mrr_1
value: 32
- type: mrr_10
value: 44.700793650793656
- type: mrr_5
value: 43.61666666666667
- type: main_score
value: 43.881461866703894
task:
type: Reranking
- dataset:
config: default
name: MTEB T2Reranking (default)
revision: null
split: dev
type: C-MTEB/T2Reranking
metrics:
- type: map
value: 69.16513825032682
- type: mrr_1
value: 67.41706161137441
- type: mrr_10
value: 80.0946053776961
- type: mrr_5
value: 79.71676822387724
- type: main_score
value: 69.16513825032682
task:
type: Reranking
tags:
- mteb
ListConRanker
Model
- We propose a Listwise-encoded Contrastive text reRanker (ListConRanker), includes a ListTransformer module for listwise encoding. The ListTransformer can facilitate global contrastive information learning between passage features, including the clustering of similar passages, the clustering between dissimilar passages, and the distinction between similar and dissimilar passages. Besides, we propose ListAttention to help ListTransformer maintain the features of the query while learning global comparative information.
- The training loss function is Circle Loss[1]. Compared with cross-entropy loss and ranking loss, it can solve the problems of low data efficiency and unsmooth gradient change.
Data
The training data consists of approximately 2.6 million queries, each corresponding to multiple passages. The data comes from the training sets of several datasets, including cMedQA1.0, cMedQA2.0, MMarcoReranking, T2Reranking, huatuo, MARC, XL-sum, CSL and so on.
Training
We trained the model in two stages. In the first stage, we freeze the parameters of embedding model and only train the ListTransformer for 4 epochs with a batch size of 1024. In the second stage, we do not freeze any parameter and train for another 2 epochs with a batch size of 256.
Inference
Due to the limited memory of GPUs, we input about 20 passages at a time for each query during training. However, during actual use, there may be situations where far more than 20 passages are input at the same time (e.g, MMarcoReranking).
To reduce the discrepancy between training and inference, we propose iterative inference. The iterative inference feeds the passages into the ListConRanker multiple times, and each time it only decides the ranking of the passage at the end of the list.
Performance
Model | cMedQA1.0 | cMedQA2.0 | MMarcoReranking | T2Reranking | Avg. |
---|---|---|---|---|---|
LdIR-Qwen2-reranker-1.5B | 86.50 | 87.11 | 39.35 | 68.84 | 70.45 |
zpoint-large-embedding-zh | 91.11 | 90.07 | 38.87 | 69.29 | 72.34 |
xiaobu-embedding-v2 | 90.96 | 90.41 | 39.91 | 69.03 | 72.58 |
Conan-embedding-v1 | 91.39 | 89.72 | 41.58 | 68.36 | 72.76 |
ListConRanker | 90.55 | 89.38 | 43.88 | 69.17 | 73.25 |
- w/o Iterative Inference | 90.20 | 89.98 | 37.52 | 69.17 | 71.72 |
How to use
from modules.listconranker import ListConRanker
reranker = ListConRanker('./ListConRanker_ckpt', use_fp16=True, list_transformer_layer=2)
# [query, passages_1, passage_2, ..., passage_n]
batch = [
[
'皮蛋是寒性的食物吗', # query
'营养医师介绍皮蛋是属于凉性的食物,中医认为皮蛋可治眼疼、牙疼、高血压、耳鸣眩晕等疾病。体虚者要少吃。', # passage_1
'皮蛋这种食品是在中国地域才常见的传统食品,它的生长汗青也是非常的悠长。', # passage_2
'喜欢皮蛋的人会觉得皮蛋是最美味的食物,不喜欢皮蛋的人则觉得皮蛋是黑暗料理,尤其很多外国朋友都不理解我们吃皮蛋的习惯' # passage_3
],
[
'月有阴晴圆缺的意义', # query
'形容的是月所有的状态,晴朗明媚,阴沉混沌,有月圆时,但多数时总是有缺陷。', # passage_1
'人有悲欢离合,月有阴晴圆缺这句话意思是人有悲欢离合的变迁,月有阴晴圆缺的转换。', # passage_2
'既然是诗歌,又哪里会有真正含义呢? 大概可以说:人生有太多坎坷,苦难,从容坦荡面对就好。', # passage_3
'一零七六年苏轼贬官密州,时年四十一岁的他政治上很不得志,时值中秋佳节,非常想念自己的弟弟子由内心颇感忧郁,情绪低沉,有感而发写了这首词。' # passage_4
]
]
# for conventional inference, please manage the batch size by yourself
scores = reranker.compute_score(batch)
print(scores)
# [[0.5126953125, 0.331298828125, 0.3642578125], [0.63671875, 0.71630859375, 0.42822265625, 0.35302734375]]
# for iterative inferfence, only a batch size of 1 is supported
# the scores do not indicate similarity but are intended only for ranking
scores = reranker.iterative_inference(batch[0])
print(scores)
# [0.5126953125, 0.331298828125, 0.3642578125]
To reproduce the results with iterative inference, please run:
python3 eval_listconranker_iterative_inference.py
To reproduce the results without iterative inference, please run:
python3 eval_listconranker.py
Reference
- https://arxiv.org/abs/2002.10857
- https://github.com/FlagOpen/FlagEmbedding
- https://arxiv.org/abs/2408.15710
License
This work is licensed under a MIT License and the weight of models is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.