prithivida commited on
Commit
ab4e07e
1 Parent(s): ab3bab5

Added jump to link

Browse files
Files changed (1) hide show
  1. README.md +20 -8
README.md CHANGED
@@ -30,29 +30,36 @@ SPLADE models are a fine balance between retrieval effectiveness (quality) and r
30
 
31
  **TL;DR of Our attempt & results**
32
  1. FLOPS tuning:
33
- - Seperate **seq len for doc and query** NOT single token max_len unlike Official SPLADE++.
34
- - **Severely restricive token budget** doc(128) and query(24) NOT 256 unlike Official SPLADE++.
35
  - Idea Inspired from **SparseEmbed** (instead of 2 models for query & doc).
36
  2. Init Weights: **MLM adapted on MS MARCO corpus**.
37
- 3. Achieves a modest yet competitive effectiveness wrt - **MRR@10 37.22** in ID data (& OOD).
38
  2. and a retrieval latency of - **47.27ms**. (multi-threaded)
39
  3. On **mono-GPU** with **only 5 negatives per query**.
40
- 4. Observations: For Industry setting
41
  - Effectiveness on custom domains needs more than just **Trading FLOPS for tiny gains**.
42
  - The Premise "SPLADE++ are not well suited to mono-cpu retrieval" does not hold.
43
 
44
- <img src="./ID.png" width=800/>
45
 
46
  *Note: The paper refers to the best performing models as SPLADE++, hence for consistency we are reusing the same.*
47
 
 
 
 
 
48
  ## Why FLOPS is one of the key metrics for industry setting ?
49
 
50
- While ONLY a empirical analysis on large sample make sense here is a spot checking - a qualitatively example to give you an idea. Our models achieve par competitive effectiveness with **~10% and ~100%, lesser tokens comparable SPLADE++ models including SoTA**.
 
51
  (We will show Quantitative results in the next section.)
52
 
53
  So, **by design "how to beat SoTA MRR?" was never our goal**, Instead "At what cost can we achieve an acceptable effectiveness i.e. MRR@10". Non-chalantly reducing lambda values (λQ,λD, see above table) will achieve a better MRR.
54
  But Lower lambda values = Higher FLOPS = More tokens = Poorer efficiency. This is NOT desirable for a Industry setting.
55
 
 
 
56
  **Ours**
57
  ```python
58
  number of actual dimensions: 113
@@ -76,6 +83,8 @@ SPLADE BOW rep:
76
 
77
  - *Note 1: This specific passage was used as an example for [ease of comparison](https://github.com/naver/splade/blob/main/inference_splade.ipynb)*
78
 
 
 
79
  ## How does it translate into Empirical metrics?
80
 
81
  Our models are token sparse and yet effective. It translates to faster retrieval (User experience) and smaller index size ($). Mean retrieval time on the standard MS-MARCO small dev set and Scaled total FLOPS loss are the respective metrics are below.
@@ -85,6 +94,8 @@ But it is unclear how well these hyperparameters are transferable to other domai
85
 
86
  <img src="./Metrics.png" width=800/>
87
 
 
 
88
  **Note: Why Anserini not PISA?** *Anserini is a production ready lucene based library. Common industry search deployments use Solr or elastic which are lucene based, hence the performance can be comparable. PISA latency is irrelevant for industry as it is a a research only system.*
89
  The full [anserini evaluation log](https://huggingface.co/DOST/SPLADEplusplus_EN/blob/main/anserini_run.log) with encoding, indexing and querying details are here.
90
 
@@ -93,7 +104,7 @@ The full [anserini evaluation log](https://huggingface.co/DOST/SPLADEplusplus_EN
93
  **Our model is different in few more aspects**
94
  - **Cocondenser Weights**: Unlike the best Official SPLADE++ or SparseEmbed we do NOT initialse weights from Luyu/co-condenser* models. Yet we achieve CoCondenser SPLADE level performance. More on this later.
95
  - **Same size models:** Official SPLADE++, SparseEmbed and Ours all finetune on the same size based model. Size of `bert-base-uncased`.
96
-
97
 
98
  ## Roadmap and future directions for Industry Suitability.
99
 
@@ -109,7 +120,8 @@ To enable a light weight inference solution without heavy **No Torch dependency*
109
  Ofcourse if it doesnt matter you could always use these models Huggingface transformers library.
110
 
111
 
112
- # How to use:
 
113
 
114
  ## With SPLADERunner Library
115
 
 
30
 
31
  **TL;DR of Our attempt & results**
32
  1. FLOPS tuning:
33
+ - Seperate **seq len for doc and query** unlike Official SPLADE++.
34
+ - **Severely restricive token budget** doc(128) & query(24) NOT 256 unlike Official SPLADE++.
35
  - Idea Inspired from **SparseEmbed** (instead of 2 models for query & doc).
36
  2. Init Weights: **MLM adapted on MS MARCO corpus**.
37
+ 3. Achieves a modest yet competitive effectiveness - **MRR@10 37.22** in ID data (& OOD).
38
  2. and a retrieval latency of - **47.27ms**. (multi-threaded)
39
  3. On **mono-GPU** with **only 5 negatives per query**.
40
+ 4. For Industry setting
41
  - Effectiveness on custom domains needs more than just **Trading FLOPS for tiny gains**.
42
  - The Premise "SPLADE++ are not well suited to mono-cpu retrieval" does not hold.
43
 
44
+ <img src="./ID.png" width=500 height=350/>
45
 
46
  *Note: The paper refers to the best performing models as SPLADE++, hence for consistency we are reusing the same.*
47
 
48
+ [**JUMP TO "How to use" to try it out**](#htu) or continue for more details.
49
+
50
+ <br/>
51
+
52
  ## Why FLOPS is one of the key metrics for industry setting ?
53
 
54
+
55
+ While ONLY a empirical analysis on large sample make sense here is a spot checking - a qualitatively example to give you an idea. Our models achieve par competitive effectiveness with **~10% and ~100%, lesser tokens comparable SPLADE++ models including SoTA**.
56
  (We will show Quantitative results in the next section.)
57
 
58
  So, **by design "how to beat SoTA MRR?" was never our goal**, Instead "At what cost can we achieve an acceptable effectiveness i.e. MRR@10". Non-chalantly reducing lambda values (λQ,λD, see above table) will achieve a better MRR.
59
  But Lower lambda values = Higher FLOPS = More tokens = Poorer efficiency. This is NOT desirable for a Industry setting.
60
 
61
+ <details>
62
+
63
  **Ours**
64
  ```python
65
  number of actual dimensions: 113
 
83
 
84
  - *Note 1: This specific passage was used as an example for [ease of comparison](https://github.com/naver/splade/blob/main/inference_splade.ipynb)*
85
 
86
+ </details>
87
+
88
  ## How does it translate into Empirical metrics?
89
 
90
  Our models are token sparse and yet effective. It translates to faster retrieval (User experience) and smaller index size ($). Mean retrieval time on the standard MS-MARCO small dev set and Scaled total FLOPS loss are the respective metrics are below.
 
94
 
95
  <img src="./Metrics.png" width=800/>
96
 
97
+ <details>
98
+
99
  **Note: Why Anserini not PISA?** *Anserini is a production ready lucene based library. Common industry search deployments use Solr or elastic which are lucene based, hence the performance can be comparable. PISA latency is irrelevant for industry as it is a a research only system.*
100
  The full [anserini evaluation log](https://huggingface.co/DOST/SPLADEplusplus_EN/blob/main/anserini_run.log) with encoding, indexing and querying details are here.
101
 
 
104
  **Our model is different in few more aspects**
105
  - **Cocondenser Weights**: Unlike the best Official SPLADE++ or SparseEmbed we do NOT initialse weights from Luyu/co-condenser* models. Yet we achieve CoCondenser SPLADE level performance. More on this later.
106
  - **Same size models:** Official SPLADE++, SparseEmbed and Ours all finetune on the same size based model. Size of `bert-base-uncased`.
107
+ </details>
108
 
109
  ## Roadmap and future directions for Industry Suitability.
110
 
 
120
  Ofcourse if it doesnt matter you could always use these models Huggingface transformers library.
121
 
122
 
123
+ <h1 id="htu">How to use? </h1>
124
+
125
 
126
  ## With SPLADERunner Library
127