|
# Basic Information |
|
|
|
This is the Dr. Decr model used in XOR-TyDi leaderboard task 1 whitebox submission. |
|
|
|
https://nlp.cs.washington.edu/xorqa/ |
|
|
|
|
|
The detailed implementation of the model can be found in: |
|
|
|
https://arxiv.org/pdf/2112.08185.pdf |
|
|
|
Source code to train the model can be found via PrimeQA's IR component: |
|
https://github.com/primeqa/primeqa/tree/updated-documentation-readme/primeqa/ir/dense/colbert_top |
|
|
|
It is a Neural IR model built on top of the ColBERTv1 api and not directly compatible with Huggingface API. The inference result on XOR Dev dataset is: |
|
``` |
|
R@2kt R@5kt |
|
te 79.41 83.19 |
|
bn 77.96 82.89 |
|
fi 65.92 72.61 |
|
ja 63.07 67.63 |
|
ko 60.35 68.07 |
|
ru 60.76 68.35 |
|
ar 65.70 73.14 |
|
Avg 67.60 73.70 |
|
``` |
|
|
|
# Limitations and Bias |
|
|
|
This model used pre-trained XLMR model and fine tuned on 7 languages in XOR-TyDi leaderboard. The performance of other languages was not tested. |
|
|
|
Since the model was fine-tuned on a large pre-trained language model XLM-Roberta, biases associated with the pre-existing XLM-Roberta model may be present in our fine-tuned model, Dr. Decr |
|
|
|
# Citation |
|
``` |
|
@article{Li2021_DrDecr, |
|
doi = {10.48550/ARXIV.2112.08185}, |
|
url = {https://arxiv.org/abs/2112.08185}, |
|
author = {Li, Yulong and Franz, Martin and Sultan, Md Arafat and Iyer, Bhavani and Lee, Young-Suk and Sil, Avirup}, |
|
keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences}, |
|
title = {Learning Cross-Lingual IR from an English Retriever}, |
|
publisher = {arXiv}, |
|
year = {2021} |
|
} |
|
``` |
|
|