eugene-yang commited on
Commit
3e8abc2
1 Parent(s): 8203834

update README

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -1,3 +1,57 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+ # Translation Tables for Probablistic Structured Queries
5
+
6
+ This repository contains the raw translation tables for tha package [`fast_psq`](https://github.com/hltcoe/PSQ).
7
+ Please refer to the GitHub for more information.
8
+ The following is a brief example for using the tables.
9
+
10
+ ## Get started
11
+
12
+ `fast_psq` is available on PyPI.
13
+ ```bash
14
+ pip install fast_psq ir_datasets ir_measures
15
+ ```
16
+
17
+ The following is an example indexing command.
18
+ ```bash
19
+ python -m fast_psq.index \
20
+ --doc_file irds:neuclir/1/zh/trec-2022 \
21
+ --lang zh \
22
+ --psq_file hltcoe/psq_translation_tables:zh.table.dict.gz \
23
+ --min_translation_prob 0.00010 \
24
+ --max_translation_alternatives 64 \
25
+ --max_translation_cdf 0.99 \
26
+ --docid doc_id \
27
+ --title title \
28
+ --body text \
29
+ --min_translation_prob 1e-4 \
30
+ --max_translation_alternatives 64 \
31
+ --output_dir ./indexes/neuclir-zh.f32/ \
32
+ --compression \
33
+ --nworkers 64
34
+ ```
35
+
36
+ The following command is an example for searching.
37
+ ```bash
38
+ python -m fast_psq.search \
39
+ --query_source irds:neuclir/1/zh/trec-2022 \
40
+ --query_field title \
41
+ --index_dir ./indexes/neuclir-zh.f32/ \
42
+ --qrels irds:neuclir/1/zh/trec-2022 \
43
+ --query_lang en \
44
+ --output_file ./neuclir-zh.en.title.f32.trec
45
+ ```
46
+
47
+
48
+ ## Citation
49
+
50
+ ```bibtex
51
+ @article{psq-repro,
52
+ title = {Efficiency-Effectiveness Tradeoff of Probabilistic Structured Queries for Cross-Language Information Retrieval},
53
+ author = {Eugene Yang and Suraj Nair and Dawn Lawrie and James Mayfield and Douglas W. Oard and Kevin Duh},
54
+ journal = {arXiv preprint arXiv},
55
+ year = {2024}
56
+ }
57
+ ```