Update README.md
Browse files
README.md
CHANGED
@@ -237,9 +237,13 @@ We compare BGE-M3 with some popular methods, including BM25, openAI embedding, e
|
|
237 |
- NarritiveQA:
|
238 |
![avatar](./imgs/nqa.jpg)
|
239 |
|
240 |
-
- BM25
|
241 |
|
242 |
We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
|
|
|
|
|
|
|
|
|
243 |
|
244 |
![avatar](./imgs/bm25.jpg)
|
245 |
|
|
|
237 |
- NarritiveQA:
|
238 |
![avatar](./imgs/nqa.jpg)
|
239 |
|
240 |
+
- Comparison with BM25
|
241 |
|
242 |
We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
|
243 |
+
We tested BM25 using two different tokenizers:
|
244 |
+
one using Lucene Analyzer and the other using the same tokenizer as M3 (i.e., the tokenizer of xlm-roberta).
|
245 |
+
The results indicate that BM25 remains a competitive baseline,
|
246 |
+
especially in long document retrieval.
|
247 |
|
248 |
![avatar](./imgs/bm25.jpg)
|
249 |
|