NetsPresso_QA / pyserini /resources /index-metadata /faiss-flat.wiki-all-6-3.dpr2-multi-retriever.20230103.186fa7.README.md
geonmin-kim's picture
Upload folder using huggingface_hub
d6585f5

A newer version of the Gradio SDK is available: 5.7.1

Upgrade

wiki-all-6-3-dpr2-multi

Faiss FlatIP index of wiki-all-6-3 (https://huggingface.co/datasets/castorini/odqa-wiki-corpora) encoded by a 2nd iteration DPR model trained on multiple QA datasets (castorini/wiki-all-6-3-multi-dpr2-passage-encoder). This index was generated on 2023/01/03 on narval at commits:

with the following command to generate the embeddings (from Tevatron repo):

python -m tevatron.driver.jax_encode \
  --output_dir=temp \
  --model_name_or_path wiki-all-6-3-multi-dpr2-passage-encoder  \
  --per_device_eval_batch_size 1248 \
  --dataset_name wiki_all_6_3.jsonl \
  --encoded_save_path corpus_emb.pkl \
  --p_max_len 256