eugene-yang's picture
update README
3e8abc2
|
raw
history blame
1.53 kB
metadata
license: mit

Translation Tables for Probablistic Structured Queries

This repository contains the raw translation tables for tha package fast_psq. Please refer to the GitHub for more information. The following is a brief example for using the tables.

Get started

fast_psq is available on PyPI.

pip install fast_psq ir_datasets ir_measures

The following is an example indexing command.

python -m fast_psq.index \
--doc_file irds:neuclir/1/zh/trec-2022 \
--lang zh \
--psq_file hltcoe/psq_translation_tables:zh.table.dict.gz \
--min_translation_prob 0.00010 \
--max_translation_alternatives 64 \
--max_translation_cdf 0.99 \
--docid doc_id \
--title title \
--body text \
--min_translation_prob 1e-4 \
--max_translation_alternatives 64 \
--output_dir ./indexes/neuclir-zh.f32/ \
--compression \
--nworkers 64

The following command is an example for searching.

python -m fast_psq.search \
--query_source irds:neuclir/1/zh/trec-2022 \
--query_field title \
--index_dir ./indexes/neuclir-zh.f32/ \
--qrels irds:neuclir/1/zh/trec-2022 \
--query_lang en \
--output_file ./neuclir-zh.en.title.f32.trec

Citation

@article{psq-repro,
    title = {Efficiency-Effectiveness Tradeoff of Probabilistic Structured Queries for Cross-Language Information Retrieval},
    author = {Eugene Yang and Suraj Nair and Dawn Lawrie and James Mayfield and Douglas W. Oard and Kevin Duh},
    journal = {arXiv preprint arXiv},
    year = {2024}
}