BAAI
/

bge-m3

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

Shitao commited on Feb 11, 2024

Commit

3ab7155

·

verified ·

1 Parent(s): f2dfce0

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -237,9 +237,13 @@ We compare BGE-M3 with some popular methods, including BM25, openAI embedding, e
   - NarritiveQA:
   ![avatar](./imgs/nqa.jpg)
-- BM25
 We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
 ![avatar](./imgs/bm25.jpg)

   - NarritiveQA:
   ![avatar](./imgs/nqa.jpg)
+- Comparison with BM25
 We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
+We tested BM25 using two different tokenizers:
+one using Lucene Analyzer and the other using the same tokenizer as M3 (i.e., the tokenizer of xlm-roberta).
+The results indicate that BM25 remains a competitive baseline,
+especially in long document retrieval.
 ![avatar](./imgs/bm25.jpg)