Spaces:
Running
Running
Putting it all together
When you use the document encoder in an indexing pipeline, the rewritten document contents are indexed:
D
SPLADE
D
Indexer
IDX
import pyterrier as pt
pt.init(version='snapshot')
import pyt_splade
dataset = pt.get_dataset('irds:msmarco-passage')
splade = pyt_splade.SpladeFactory()
indexer = pt.IterDictIndexer('./msmarco_psg', pretokenised=True)
indxer_pipe = splade.indexing() >> indexer
indxer_pipe.index(dataset.get_corpus_iter())
Once you built an index, you can build a retrieval pipeline that first encodes the query, and then performs retrieval:
Q
SPLADE
Q
TF Retriever
IDX
R
splade_retr = splade.query() >> pt.BatchRetrieve('./msmarco_psg', wmodel='Tf')
References & Credits
This package uses Naver's SPLADE repository.
- Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant. SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking. SIGIR 2021.
- Craig Macdonald, Nicola Tonellotto, Sean MacAvaney, Iadh Ounis. PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval. CIKM 2021.