prithivida
/

Splade_PP_en_v1

@@ -30,29 +30,36 @@ SPLADE models are a fine balance between retrieval effectiveness (quality) and r
 **TL;DR of Our attempt & results**
 1. FLOPS tuning:
-   - Seperate **seq len for doc and query** NOT single token max_len unlike Official SPLADE++.
-   - **Severely restricive token budget** doc(128) and query(24) NOT 256 unlike Official SPLADE++.
    - Idea Inspired from **SparseEmbed** (instead of 2 models for query & doc).
 2. Init Weights: **MLM adapted on MS MARCO corpus**.
-3. Achieves a modest yet competitive effectiveness wrt - **MRR@10 37.22** in ID data (& OOD).
 2. and a retrieval latency of - **47.27ms**. (multi-threaded)
 3. On **mono-GPU** with **only 5 negatives per query**.
-4. Observations: For Industry setting
    - Effectiveness on custom domains needs more than just **Trading FLOPS for tiny gains**.
    - The Premise "SPLADE++ are not well suited to mono-cpu retrieval" does not hold.
-<img src="./ID.png" width=800/>
 *Note: The paper refers to the best performing models as SPLADE++, hence for consistency we are reusing the same.*
 ## Why FLOPS is one of the key metrics for industry setting ?
-While ONLY a empirical analysis on large sample make sense here is a spot checking - a qualitatively example to give you an idea. Our models achieve par competitive effectiveness with **~10% and ~100%, lesser tokens comparable SPLADE++ models including SoTA**.
 (We will show Quantitative results in the next section.)
 So, **by design "how to beat SoTA MRR?" was never our goal**, Instead "At what cost can we achieve an acceptable effectiveness i.e. MRR@10". Non-chalantly reducing lambda values (λQ,λD, see above table) will achieve a better MRR.
 But Lower lambda values = Higher FLOPS = More tokens = Poorer efficiency. This is NOT desirable for a Industry setting.
 **Ours**
 ```python
 number of actual dimensions:  113
@@ -76,6 +83,8 @@ SPLADE BOW rep:
 - *Note 1: This specific passage was used as an example for [ease of comparison](https://github.com/naver/splade/blob/main/inference_splade.ipynb)*
 ## How does it translate into Empirical metrics?
 Our models are token sparse and yet effective. It translates to faster retrieval (User experience) and smaller index size ($). Mean retrieval time on the standard MS-MARCO small dev set and Scaled total FLOPS loss are the respective metrics are below.
@@ -85,6 +94,8 @@ But it is unclear how well these hyperparameters are transferable to other domai
 <img src="./Metrics.png" width=800/>
 **Note: Why Anserini not PISA?** *Anserini is a production ready lucene based library. Common industry search deployments use Solr or elastic which are lucene based, hence the performance can be comparable. PISA latency is irrelevant for industry as it is a a research only system.*
 The full [anserini evaluation log](https://huggingface.co/DOST/SPLADEplusplus_EN/blob/main/anserini_run.log) with encoding, indexing and querying details are here.
@@ -93,7 +104,7 @@ The full [anserini evaluation log](https://huggingface.co/DOST/SPLADEplusplus_EN
 **Our model is different in few more aspects**
 - **Cocondenser Weights**: Unlike the best Official SPLADE++ or SparseEmbed we do NOT initialse weights from Luyu/co-condenser* models. Yet we achieve CoCondenser SPLADE level performance. More on this later.
 - **Same size models:** Official SPLADE++, SparseEmbed and Ours all finetune on the same size based model. Size of `bert-base-uncased`.
 ## Roadmap and future directions for Industry Suitability.
@@ -109,7 +120,8 @@ To enable a light weight inference solution without heavy **No Torch dependency*
 Ofcourse if it doesnt matter you could always use these models Huggingface transformers library.
-# How to use:
 ## With SPLADERunner Library

 **TL;DR of Our attempt & results**
 1. FLOPS tuning:
+   - Seperate **seq len for doc and query** unlike Official SPLADE++.
+   - **Severely restricive token budget** doc(128) & query(24) NOT 256 unlike Official SPLADE++.
    - Idea Inspired from **SparseEmbed** (instead of 2 models for query & doc).
 2. Init Weights: **MLM adapted on MS MARCO corpus**.
+3. Achieves a modest yet competitive effectiveness - **MRR@10 37.22** in ID data (& OOD).
 2. and a retrieval latency of - **47.27ms**. (multi-threaded)
 3. On **mono-GPU** with **only 5 negatives per query**.
+4. For Industry setting
    - Effectiveness on custom domains needs more than just **Trading FLOPS for tiny gains**.
    - The Premise "SPLADE++ are not well suited to mono-cpu retrieval" does not hold.
+<img src="./ID.png" width=500 height=350/>
 *Note: The paper refers to the best performing models as SPLADE++, hence for consistency we are reusing the same.*
+[**JUMP TO "How to use" to try it out**](#htu) or continue for more details.
+<br/>
 ## Why FLOPS is one of the key metrics for industry setting ?
+  While ONLY a empirical analysis on large sample make sense here is a spot checking - a qualitatively example to give you an idea. Our models achieve par competitive effectiveness with **~10% and ~100%, lesser tokens comparable SPLADE++ models including SoTA**.
 (We will show Quantitative results in the next section.)
 So, **by design "how to beat SoTA MRR?" was never our goal**, Instead "At what cost can we achieve an acceptable effectiveness i.e. MRR@10". Non-chalantly reducing lambda values (λQ,λD, see above table) will achieve a better MRR.
 But Lower lambda values = Higher FLOPS = More tokens = Poorer efficiency. This is NOT desirable for a Industry setting.
+<details>
 **Ours**
 ```python
 number of actual dimensions:  113
 - *Note 1: This specific passage was used as an example for [ease of comparison](https://github.com/naver/splade/blob/main/inference_splade.ipynb)*
+</details>
 ## How does it translate into Empirical metrics?
 Our models are token sparse and yet effective. It translates to faster retrieval (User experience) and smaller index size ($). Mean retrieval time on the standard MS-MARCO small dev set and Scaled total FLOPS loss are the respective metrics are below.
 <img src="./Metrics.png" width=800/>
+<details>
 **Note: Why Anserini not PISA?** *Anserini is a production ready lucene based library. Common industry search deployments use Solr or elastic which are lucene based, hence the performance can be comparable. PISA latency is irrelevant for industry as it is a a research only system.*
 The full [anserini evaluation log](https://huggingface.co/DOST/SPLADEplusplus_EN/blob/main/anserini_run.log) with encoding, indexing and querying details are here.
 **Our model is different in few more aspects**
 - **Cocondenser Weights**: Unlike the best Official SPLADE++ or SparseEmbed we do NOT initialse weights from Luyu/co-condenser* models. Yet we achieve CoCondenser SPLADE level performance. More on this later.
 - **Same size models:** Official SPLADE++, SparseEmbed and Ours all finetune on the same size based model. Size of `bert-base-uncased`.
+</details>
 ## Roadmap and future directions for Industry Suitability.
 Ofcourse if it doesnt matter you could always use these models Huggingface transformers library.
+<h1 id="htu">How to use? </h1>
 ## With SPLADERunner Library