Basic Information
This is the Dr. Decr model used in XOR-TyDi leaderboard task 1 whitebox submission.
https://nlp.cs.washington.edu/xorqa/
The detailed implementation of the model can be found in:
https://arxiv.org/pdf/2112.08185.pdf
Source code to train the model can be found via PrimeQA's IR component: https://github.com/primeqa/primeqa/tree/main/examples/drdecr
It is a Neural IR model built on top of the ColBERTv1 api and not directly compatible with Huggingface API. The inference result on XOR Dev dataset is:
R@2kt R@5kt
te 66.67 70.88
bn 70.23 75.08
fi 82.24 86.18
ja 65.92 72.93
ko 67.93 71.73
ru 63.07 69.71
ar 78.15 82.77
Avg 70.60 75.61
Limitations and Bias
This model used pre-trained XLM-R base model and fine tuned on 7 languages in XOR-TyDi leaderboard. The performance of other languages was not tested.
Since the model was fine-tuned on a large pre-trained language model XLM-Roberta, biases associated with the pre-existing XLM-Roberta model may be present in our fine-tuned model, Dr. Decr
Citation
@article{Li2021_DrDecr,
doi = {10.48550/ARXIV.2112.08185},
url = {https://arxiv.org/abs/2112.08185},
author = {Li, Yulong and Franz, Martin and Sultan, Md Arafat and Iyer, Bhavani and Lee, Young-Suk and Sil, Avirup},
keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Learning Cross-Lingual IR from an English Retriever},
publisher = {arXiv},
year = {2021}
}