splade / wrapup.md
macavaney's picture
Update wrapup.md (#1)
fef8ff6
|
raw
history blame
1.87 kB

Putting it all together

When you use the document encoder in an indexing pipeline, the rewritten document contents are indexed:

D
SPLADE
D
Indexer
IDX
import pyterrier as pt
pt.init(version='snapshot')
import pyt_splade

dataset = pt.get_dataset('irds:msmarco-passage')
splade = pyt_splade.SpladeFactory()

indexer = pt.IterDictIndexer('./msmarco_psg', pretokenised=True)

indxer_pipe = splade.indexing() >> indexer
indxer_pipe.index(dataset.get_corpus_iter())

Once you built an index, you can build a retrieval pipeline that first encodes the query, and then performs retrieval:

Q
SPLADE
Q
TF Retriever
IDX
R
splade_retr = splade.query() >> pt.BatchRetrieve('./msmarco_psg', wmodel='Tf')

References & Credits

This package uses Naver's SPLADE repository.