Upload folder using huggingface_hub
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- .gitattributes +1 -1
- README.md +206 -2
- index/0.codes.pt +3 -0
- index/0.metadata.json +6 -0
- index/0.residuals.pt +3 -0
- index/1.codes.pt +3 -0
- index/1.metadata.json +6 -0
- index/1.residuals.pt +3 -0
- index/10.codes.pt +3 -0
- index/10.metadata.json +6 -0
- index/10.residuals.pt +3 -0
- index/11.codes.pt +3 -0
- index/11.metadata.json +6 -0
- index/11.residuals.pt +3 -0
- index/12.codes.pt +3 -0
- index/12.metadata.json +6 -0
- index/12.residuals.pt +3 -0
- index/13.codes.pt +3 -0
- index/13.metadata.json +6 -0
- index/13.residuals.pt +3 -0
- index/14.codes.pt +3 -0
- index/14.metadata.json +6 -0
- index/14.residuals.pt +3 -0
- index/15.codes.pt +3 -0
- index/15.metadata.json +6 -0
- index/15.residuals.pt +3 -0
- index/16.codes.pt +3 -0
- index/16.metadata.json +6 -0
- index/16.residuals.pt +3 -0
- index/17.codes.pt +3 -0
- index/17.metadata.json +6 -0
- index/17.residuals.pt +3 -0
- index/18.codes.pt +3 -0
- index/18.metadata.json +6 -0
- index/18.residuals.pt +3 -0
- index/19.codes.pt +3 -0
- index/19.metadata.json +6 -0
- index/19.residuals.pt +3 -0
- index/2.codes.pt +3 -0
- index/2.metadata.json +6 -0
- index/2.residuals.pt +3 -0
- index/20.codes.pt +3 -0
- index/20.metadata.json +6 -0
- index/20.residuals.pt +3 -0
- index/21.codes.pt +3 -0
- index/21.metadata.json +6 -0
- index/21.residuals.pt +3 -0
- index/3.codes.pt +3 -0
- index/3.metadata.json +6 -0
- index/3.residuals.pt +3 -0
.gitattributes
CHANGED
@@ -33,4 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
-
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
index/collection.json filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -5,7 +5,211 @@ language:
|
|
5 |
tags:
|
6 |
- ColBERT
|
7 |
---
|
|
|
|
|
|
|
|
|
8 |
|
9 |
-
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
tags:
|
6 |
- ColBERT
|
7 |
---
|
8 |
+
<p align="center">
|
9 |
+
<img align="center" src="docs/images/colbertofficial.png" width="430px" />
|
10 |
+
</p>
|
11 |
+
<p align="left">
|
12 |
|
13 |
+
# ColBERT (v2)
|
14 |
|
15 |
+
### ColBERT is a _fast_ and _accurate_ retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
|
16 |
+
|
17 |
+
[<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanford-futuredata/ColBERT/blob/main/docs/intro2new.ipynb)
|
18 |
+
|
19 |
+
|
20 |
+
<p align="center">
|
21 |
+
<img align="center" src="docs/images/ColBERT-Framework-MaxSim-W370px.png" />
|
22 |
+
</p>
|
23 |
+
<p align="center">
|
24 |
+
<b>Figure 1:</b> ColBERT's late interaction, efficiently scoring the fine-grained similarity between a queries and a passage.
|
25 |
+
</p>
|
26 |
+
|
27 |
+
As Figure 1 illustrates, ColBERT relies on fine-grained **contextual late interaction**: it encodes each passage into a **matrix** of token-level embeddings (shown above in blue). Then at search time, it embeds every query into another matrix (shown in green) and efficiently finds passages that contextually match the query using scalable vector-similarity (`MaxSim`) operators.
|
28 |
+
|
29 |
+
These rich interactions allow ColBERT to surpass the quality of _single-vector_ representation models, while scaling efficiently to large corpora. You can read more in our papers:
|
30 |
+
|
31 |
+
* [**ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT**](https://arxiv.org/abs/2004.12832) (SIGIR'20).
|
32 |
+
* [**Relevance-guided Supervision for OpenQA with ColBERT**](https://arxiv.org/abs/2007.00814) (TACL'21).
|
33 |
+
* [**Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval**](https://arxiv.org/abs/2101.00436) (NeurIPS'21).
|
34 |
+
* [**ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction**](https://arxiv.org/abs/2112.01488) (NAACL'22).
|
35 |
+
* [**PLAID: An Efficient Engine for Late Interaction Retrieval**](https://arxiv.org/abs/2205.09707) (CIKM'22).
|
36 |
+
|
37 |
+
----
|
38 |
+
|
39 |
+
## 🚨 **Announcements**
|
40 |
+
|
41 |
+
* (1/29/23) We have merged a new index updater feature and support for additional Hugging Face models! These are in beta so please give us feedback as you try them out.
|
42 |
+
* (1/24/23) If you're looking for the **DSP** framework for composing ColBERTv2 and LLMs, it's at: https://github.com/stanfordnlp/dsp
|
43 |
+
|
44 |
+
----
|
45 |
+
|
46 |
+
## ColBERTv1
|
47 |
+
|
48 |
+
The ColBERTv1 code from the SIGIR'20 paper is in the [`colbertv1` branch](https://github.com/stanford-futuredata/ColBERT/tree/colbertv1). See [here](#branches) for more information on other branches.
|
49 |
+
|
50 |
+
|
51 |
+
## Installation
|
52 |
+
|
53 |
+
ColBERT requires Python 3.7+ and Pytorch 1.9+ and uses the [Hugging Face Transformers](https://github.com/huggingface/transformers) library.
|
54 |
+
|
55 |
+
We strongly recommend creating a conda environment using the commands below. (If you don't have conda, follow the official [conda installation guide](https://docs.anaconda.com/anaconda/install/linux/#installation).)
|
56 |
+
|
57 |
+
We have also included a new environment file specifically for CPU-only environments (`conda_env_cpu.yml`), but note that if you are testing CPU execution on a machine that includes GPUs you might need to specify `CUDA_VISIBLE_DEVICES=""` as part of your command. Note that a GPU is required for training and indexing.
|
58 |
+
|
59 |
+
```
|
60 |
+
conda env create -f conda_env[_cpu].yml
|
61 |
+
conda activate colbert
|
62 |
+
```
|
63 |
+
|
64 |
+
If you face any problems, please [open a new issue](https://github.com/stanford-futuredata/ColBERT/issues) and we'll help you promptly!
|
65 |
+
|
66 |
+
|
67 |
+
|
68 |
+
## Overview
|
69 |
+
|
70 |
+
Using ColBERT on a dataset typically involves the following steps.
|
71 |
+
|
72 |
+
**Step 0: Preprocess your collection.** At its simplest, ColBERT works with tab-separated (TSV) files: a file (e.g., `collection.tsv`) will contain all passages and another (e.g., `queries.tsv`) will contain a set of queries for searching the collection.
|
73 |
+
|
74 |
+
**Step 1: Download the [pre-trained ColBERTv2 checkpoint](https://downloads.cs.stanford.edu/nlp/data/colbert/colbertv2/colbertv2.0.tar.gz).** This checkpoint has been trained on the MS MARCO Passage Ranking task. You can also _optionally_ [train your own ColBERT model](#training).
|
75 |
+
|
76 |
+
**Step 2: Index your collection.** Once you have a trained ColBERT model, you need to [index your collection](#indexing) to permit fast retrieval. This step encodes all passages into matrices, stores them on disk, and builds data structures for efficient search.
|
77 |
+
|
78 |
+
**Step 3: Search the collection with your queries.** Given the model and index, you can [issue queries over the collection](#retrieval) to retrieve the top-k passages for each query.
|
79 |
+
|
80 |
+
Below, we illustrate these steps via an example run on the MS MARCO Passage Ranking task.
|
81 |
+
|
82 |
+
|
83 |
+
## API Usage Notebook
|
84 |
+
|
85 |
+
**NEW**: We have an experimental notebook on [Google Colab](https://colab.research.google.com/github/stanford-futuredata/ColBERT/blob/main/docs/intro2new.ipynb) that you can use with free GPUs. Indexing 10,000 on the free Colab T4 GPU takes six minutes.
|
86 |
+
|
87 |
+
This Jupyter notebook **[docs/intro.ipynb notebook](docs/intro.ipynb)** illustrates using the key features of ColBERT with the new Python API.
|
88 |
+
|
89 |
+
It includes how to download the ColBERTv2 model checkpoint trained on MS MARCO Passage Ranking and how to download our new LoTTE benchmark.
|
90 |
+
|
91 |
+
|
92 |
+
## Data
|
93 |
+
|
94 |
+
This repository works directly with a simple **tab-separated file** format to store queries, passages, and top-k ranked lists.
|
95 |
+
|
96 |
+
|
97 |
+
* Queries: each line is `qid \t query text`.
|
98 |
+
* Collection: each line is `pid \t passage text`.
|
99 |
+
* Top-k Ranking: each line is `qid \t pid \t rank`.
|
100 |
+
|
101 |
+
This works directly with the data format of the [MS MARCO Passage Ranking](https://github.com/microsoft/MSMARCO-Passage-Ranking) dataset. You will need the training triples (`triples.train.small.tar.gz`), the official top-1000 ranked lists for the dev set queries (`top1000.dev`), and the dev set relevant passages (`qrels.dev.small.tsv`). For indexing the full collection, you will also need the list of passages (`collection.tar.gz`).
|
102 |
+
|
103 |
+
|
104 |
+
## Indexing
|
105 |
+
|
106 |
+
For fast retrieval, indexing precomputes the ColBERT representations of passages.
|
107 |
+
|
108 |
+
Example usage:
|
109 |
+
|
110 |
+
```
|
111 |
+
from colbert.infra import Run, RunConfig, ColBERTConfig
|
112 |
+
from colbert import Indexer
|
113 |
+
|
114 |
+
if __name__=='__main__':
|
115 |
+
with Run().context(RunConfig(nranks=1, experiment="msmarco")):
|
116 |
+
|
117 |
+
config = ColBERTConfig(
|
118 |
+
nbits=2,
|
119 |
+
root="/path/to/experiments",
|
120 |
+
)
|
121 |
+
indexer = Indexer(checkpoint="/path/to/checkpoint", config=config)
|
122 |
+
indexer.index(name="msmarco.nbits=2", collection="/path/to/MSMARCO/collection.tsv")
|
123 |
+
```
|
124 |
+
|
125 |
+
|
126 |
+
## Retrieval
|
127 |
+
|
128 |
+
We typically recommend that you use ColBERT for **end-to-end** retrieval, where it directly finds its top-k passages from the full collection:
|
129 |
+
|
130 |
+
```
|
131 |
+
from colbert.data import Queries
|
132 |
+
from colbert.infra import Run, RunConfig, ColBERTConfig
|
133 |
+
from colbert import Searcher
|
134 |
+
|
135 |
+
if __name__=='__main__':
|
136 |
+
with Run().context(RunConfig(nranks=1, experiment="msmarco")):
|
137 |
+
|
138 |
+
config = ColBERTConfig(
|
139 |
+
root="/path/to/experiments",
|
140 |
+
)
|
141 |
+
searcher = Searcher(index="msmarco.nbits=2", config=config)
|
142 |
+
queries = Queries("/path/to/MSMARCO/queries.dev.small.tsv")
|
143 |
+
ranking = searcher.search_all(queries, k=100)
|
144 |
+
ranking.save("msmarco.nbits=2.ranking.tsv")
|
145 |
+
```
|
146 |
+
|
147 |
+
You can optionally specify the `ncells`, `centroid_score_threshold`, and `ndocs` search hyperparameters to trade off between speed and result quality. Defaults for different values of `k` are listed in colbert/searcher.py.
|
148 |
+
|
149 |
+
We can evaluate the MSMARCO rankings using the following command:
|
150 |
+
|
151 |
+
```
|
152 |
+
python -m utility.evaluate.msmarco_passages --ranking "/path/to/msmarco.nbits=2.ranking.tsv" --qrels "/path/to/MSMARCO/qrels.dev.small.tsv"
|
153 |
+
```
|
154 |
+
|
155 |
+
## Training
|
156 |
+
|
157 |
+
We provide a [pre-trained model checkpoint](https://downloads.cs.stanford.edu/nlp/data/colbert/colbertv2/colbertv2.0.tar.gz), but we also detail how to train from scratch here.
|
158 |
+
Note that this example demonstrates the ColBERTv1 style of training, but the provided checkpoint was trained with ColBERTv2.
|
159 |
+
|
160 |
+
Training requires a JSONL triples file with a `[qid, pid+, pid-]` list per line. The query IDs and passage IDs correspond to the specified `queries.tsv` and `collection.tsv` files respectively.
|
161 |
+
|
162 |
+
Example usage (training on 4 GPUs):
|
163 |
+
|
164 |
+
```
|
165 |
+
from colbert.infra import Run, RunConfig, ColBERTConfig
|
166 |
+
from colbert import Trainer
|
167 |
+
|
168 |
+
if __name__=='__main__':
|
169 |
+
with Run().context(RunConfig(nranks=4, experiment="msmarco")):
|
170 |
+
|
171 |
+
config = ColBERTConfig(
|
172 |
+
bsize=32,
|
173 |
+
root="/path/to/experiments",
|
174 |
+
)
|
175 |
+
trainer = Trainer(
|
176 |
+
triples="/path/to/MSMARCO/triples.train.small.tsv",
|
177 |
+
queries="/path/to/MSMARCO/queries.train.small.tsv",
|
178 |
+
collection="/path/to/MSMARCO/collection.tsv",
|
179 |
+
config=config,
|
180 |
+
)
|
181 |
+
|
182 |
+
checkpoint_path = trainer.train()
|
183 |
+
|
184 |
+
print(f"Saved checkpoint to {checkpoint_path}...")
|
185 |
+
```
|
186 |
+
|
187 |
+
## Running a lightweight ColBERTv2 server
|
188 |
+
We provide a script to run a lightweight server which serves k (upto 100) results in ranked order for a given search query, in JSON format. This script can be used to power DSP programs.
|
189 |
+
|
190 |
+
To run the server, update the environment variables `INDEX_ROOT` and `INDEX_NAME` in the `.env` file to point to the appropriate ColBERT index. The run the following command:
|
191 |
+
```
|
192 |
+
python server.py
|
193 |
+
```
|
194 |
+
|
195 |
+
A sample query:
|
196 |
+
```
|
197 |
+
http://localhost:8893/api/search?query=Who won the 2022 FIFA world cup&k=25
|
198 |
+
```
|
199 |
+
|
200 |
+
## Branches
|
201 |
+
|
202 |
+
### Supported branches
|
203 |
+
|
204 |
+
* [`main`](https://github.com/stanford-futuredata/ColBERT/tree/main): Stable branch with ColBERTv2 + PLAID.
|
205 |
+
* [`colbertv1`](https://github.com/stanford-futuredata/ColBERT/tree/colbertv1): Legacy branch for ColBERTv1.
|
206 |
+
|
207 |
+
### Deprecated branches
|
208 |
+
* [`new_api`](https://github.com/stanford-futuredata/ColBERT/tree/new_api): Base ColBERTv2 implementation.
|
209 |
+
* [`cpu_inference`](https://github.com/stanford-futuredata/ColBERT/tree/cpu_inference): ColBERTv2 implementation with CPU search support.
|
210 |
+
* [`fast_search`](https://github.com/stanford-futuredata/ColBERT/tree/fast_search): ColBERTv2 implementation with PLAID.
|
211 |
+
* [`binarization`](https://github.com/stanford-futuredata/ColBERT/tree/binarization): ColBERT with a baseline binarization-based compression strategy (as opposed to ColBERTv2's residual compression, which we found to be more robust).
|
212 |
+
|
213 |
+
## Acknowledgments
|
214 |
+
|
215 |
+
ColBERT logo designed by Chuyi Zhang.
|
index/0.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:faa8af252151d04c1f7a59d4ea9e6d6392ff68a95a1306f128a036a48f9a9692
|
3 |
+
size 19690268
|
index/0.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 0,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4922279,
|
5 |
+
"embedding_offset": 0
|
6 |
+
}
|
index/0.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ce26ce981afd7abb6fd40a62085465df358985e8eeb971e5c956e050332222fe
|
3 |
+
size 157514096
|
index/1.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3f7c59999e8153671d538c5112a66268c2baff0f4c739b36cb40c8e98f55f1c7
|
3 |
+
size 19711452
|
index/1.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 25000,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4927578,
|
5 |
+
"embedding_offset": 4922279
|
6 |
+
}
|
index/1.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3c39dd8be0d558c249f00296c49d4e3328310442c5c0ed8cef6bf1da15a5ff54
|
3 |
+
size 157683696
|
index/10.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7c7e415678a2a53cbb08c6c945c9318118d988cb1ed1e13a8140095c96f36f2e
|
3 |
+
size 19726049
|
index/10.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 250000,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4931218,
|
5 |
+
"embedding_offset": 49263426
|
6 |
+
}
|
index/10.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8428758678a0784290af52ee61136ffedf1bfcf927c44971ab2b2ca4f7d66fd3
|
3 |
+
size 157800181
|
index/11.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:afa18449692e2659a44ced6fd410d2ed95f5cdd14b0928948f38ba5a8d2e74b0
|
3 |
+
size 19693857
|
index/11.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 275000,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4923168,
|
5 |
+
"embedding_offset": 54194644
|
6 |
+
}
|
index/11.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:fb440faf76b82ce0df8157e33513d8bf7512242fd67156770787178be00c2cfa
|
3 |
+
size 157542581
|
index/12.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:bfe221c370bdc8b3ce207dcdc5d041606537c26cdfb617239710dcb0f34fed1b
|
3 |
+
size 19740065
|
index/12.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 300000,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4934727,
|
5 |
+
"embedding_offset": 59117812
|
6 |
+
}
|
index/12.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6515078c357fe9139aa1d523e0a1c3ada550b470de73594b9fb487339d4ea3ad
|
3 |
+
size 157912437
|
index/13.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:25c6f35e19816cdb269ef5aac105439bd8028a959a06269b60d89d53214619f1
|
3 |
+
size 19726881
|
index/13.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 325000,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4931425,
|
5 |
+
"embedding_offset": 64052539
|
6 |
+
}
|
index/13.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f2a4ad6c28d18dc8c4b9685cab77924b9517044fa613ab030bbd978c44ec0580
|
3 |
+
size 157806773
|
index/14.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:875e3ae0665cb6756809fc46cd52c5fd131327d165a390e8b2ce9248f3f00228
|
3 |
+
size 19688353
|
index/14.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 350000,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4921795,
|
5 |
+
"embedding_offset": 68983964
|
6 |
+
}
|
index/14.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f1b840fb2a83987eaf0c2623c9a854cb92171476226084f740e849e9311a5c32
|
3 |
+
size 157498613
|
index/15.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6bcca9c7ff3e3d81295f2bbe74d11c9ce0eac3333342856893d9e368074cccee
|
3 |
+
size 19732769
|
index/15.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 375000,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4932896,
|
5 |
+
"embedding_offset": 73905759
|
6 |
+
}
|
index/15.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f69499832c32274f93508cf21372526fe16d51534861db23d42c31feba32cf7f
|
3 |
+
size 157853877
|
index/16.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:97401b22758563a9939e3a395c8749d4fbdc559720160606f5411e4e5215e9a2
|
3 |
+
size 19717601
|
index/16.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 400000,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4929109,
|
5 |
+
"embedding_offset": 78838655
|
6 |
+
}
|
index/16.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d481fb96abbc8761b6a53295ad49805a80b5d008a4f2e01edc898c93ca1fd847
|
3 |
+
size 157732661
|
index/17.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3feac65211d019980e0a4681ea6fe39a447d100adf0a2e8a60e3d3915ad918bc
|
3 |
+
size 19712929
|
index/17.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 425000,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4927941,
|
5 |
+
"embedding_offset": 83767764
|
6 |
+
}
|
index/17.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:269e4abf6094cbeafee96b6dc264a546209fa8045972abf3fe6f627ed2c1d32f
|
3 |
+
size 157695285
|
index/18.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a2ffea2de869f7857222eed7e6ecb53179d96f203e634a2ccfe2e362bd02fe4b
|
3 |
+
size 19744801
|
index/18.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 450000,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4935910,
|
5 |
+
"embedding_offset": 88695705
|
6 |
+
}
|
index/18.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:943a180f327d419c584c1c868ea844d7fcffdccb270ee6bedc7d3e0ec3e28bba
|
3 |
+
size 157950325
|
index/19.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1d87d5e98754e8976a4c3403f3234f0e9c4985123ef5bd766c65a2193a27508e
|
3 |
+
size 19739041
|
index/19.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 475000,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4934468,
|
5 |
+
"embedding_offset": 93631615
|
6 |
+
}
|
index/19.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:03825fdc7951a347f920381baae96ffbd2cda371484e14aba837e01cd821b568
|
3 |
+
size 157904181
|
index/2.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2b254c5cf8092972df7bd1d9c1ecc4752ae3ce5989a2babf55fe4706ff3dd7c2
|
3 |
+
size 19715164
|
index/2.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 50000,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4928507,
|
5 |
+
"embedding_offset": 9849857
|
6 |
+
}
|
index/2.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6711deef78107ccf6289262455da29a1e3057024545bf47017eb1fb531275356
|
3 |
+
size 157713392
|
index/20.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8f3e4f98740b61664e85788d7a446e3bd39dee2b8f20ac896988e413a8c55b4f
|
3 |
+
size 19704161
|
index/20.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 500000,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4925756,
|
5 |
+
"embedding_offset": 98566083
|
6 |
+
}
|
index/20.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:31ef9f7b66c831825a320cb3ed9ccd3a8d27cba65e504e4441a73e065b31c483
|
3 |
+
size 157625397
|
index/21.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:00acd9085e7ba64fae6f32b9de08bd75113fa176cb96d1c6ab769a48821acc00
|
3 |
+
size 6679777
|
index/21.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 525000,
|
3 |
+
"num_passages": 8458,
|
4 |
+
"num_embeddings": 1669653,
|
5 |
+
"embedding_offset": 103491839
|
6 |
+
}
|
index/21.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:695e2365603dbcde298bee700cd3099270a5732e1fb009d6466831ce8ead3364
|
3 |
+
size 53430069
|
index/3.codes.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:75b430a72ed08008ffc9a393bc160310cd1ee75c5de7871dd82fa05317f41b9c
|
3 |
+
size 19720412
|
index/3.metadata.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"passage_offset": 75000,
|
3 |
+
"num_passages": 25000,
|
4 |
+
"num_embeddings": 4929813,
|
5 |
+
"embedding_offset": 14778364
|
6 |
+
}
|
index/3.residuals.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:52e2dc43f6443185f78533abf6354681abdc8c554846a06f87fb0ed85c54c1fc
|
3 |
+
size 157755184
|