Shitao commited on
Commit
3ab7155
1 Parent(s): f2dfce0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -237,9 +237,13 @@ We compare BGE-M3 with some popular methods, including BM25, openAI embedding, e
237
  - NarritiveQA:
238
  ![avatar](./imgs/nqa.jpg)
239
 
240
- - BM25
241
 
242
  We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
 
 
 
 
243
 
244
  ![avatar](./imgs/bm25.jpg)
245
 
 
237
  - NarritiveQA:
238
  ![avatar](./imgs/nqa.jpg)
239
 
240
+ - Comparison with BM25
241
 
242
  We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
243
+ We tested BM25 using two different tokenizers:
244
+ one using Lucene Analyzer and the other using the same tokenizer as M3 (i.e., the tokenizer of xlm-roberta).
245
+ The results indicate that BM25 remains a competitive baseline,
246
+ especially in long document retrieval.
247
 
248
  ![avatar](./imgs/bm25.jpg)
249