Basic Information
This is the Dr. Decr model used in XOR-TyDi leaderboard task 1 whitebox submission.
https://nlp.cs.washington.edu/xorqa/
The detailed implementation of the model can be found in:
https://arxiv.org/pdf/2112.08185.pdf
Source code to train the model can be found via PrimeQA's IR component: https://github.com/primeqa/primeqa/tree/updated-documentation-readme/primeqa/ir/dense/colbert_top
It is a Neural IR model built on top of the ColBERTv1 api and not directly compatible with Huggingface API. The inference result on XOR Dev dataset is:
R@2kt R@5kt
te 79.41 83.19
bn 77.96 82.89
fi 65.92 72.61
ja 63.07 67.63
ko 60.35 68.07
ru 60.76 68.35
ar 65.70 73.14
Avg 67.60 73.70
Limitations and Bias
This model used pre-trained XLMR model and fine tuned on 7 languages in XOR-TyDi leaderboard. The performance of other languages was not tested.
Since the model was fine-tuned on a large pre-trained language model XLM-Roberta, biases associated with the pre-existing XLM-Roberta model may be present in our fine-tuned model, Dr. Decr
Citation
@article{Li2021_DrDecr,
doi = {10.48550/ARXIV.2112.08185},
url = {https://arxiv.org/abs/2112.08185},
author = {Li, Yulong and Franz, Martin and Sultan, Md Arafat and Iyer, Bhavani and Lee, Young-Suk and Sil, Avirup},
keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Learning Cross-Lingual IR from an English Retriever},
publisher = {arXiv},
year = {2021}
}