Sentence similarity

#1
by nickmuchi - opened

Wondering how I can use this model for semantic search?

Please refer to https://github.com/microsoft/unilm/blob/master/e5/mteb_beir_eval.py#L43-L91 for an implementation.

The overall idea is to first encode and index the corpus texts with passage: prefix, and then encode the query with query: prefix. Once you have the text embeddings, any existing approximate nearest neighbor tools such as Faiss can be used for semantic search.

Thanks for the response, I used the SentenceTrasformer package with the e5-base model and managed to generate embeddings with similarity scores and they were pretty much the same as the example in the model card. I also did not include the prefixes you mentioned as well, how critical are they for generating embeddings? Thanks for your help.

The E5 models are trained with these prefixes, so it would be better to include them for inference as well. I am not sure how critical they are, but I certainly expect some performance degradation without prefixes.

Sounds good, thank you!

nickmuchi changed discussion status to closed

Sign up or log in to comment