Performance on MTEB
Hi,
I was wondering if using this model for both text and image embeddings would degrade text performance; from the benchmarks its not quite clear how it stands on MTEB.
Could you shed some light on it? Is it better/worse than for example intfloat/e5-mistral-7b-instruct
?
Thanks for your help.
Cheers,
Jaro
This is a great question. Currently, we haven’t tested it yet, but it is part of our plan.
I expect that the results on MTEB may not be as strong as the current state-of-the-art text embedding models, as we haven't trained on any text-only data. One of our key next steps is to combine both text and current image pairwise data and train a model. We believe that incorporating more text pairwise data could also benefit image-related tasks, based on insights from other literature (such as E5-v).
Thanks for your answer. Sounds like a great plan - and cheers for the great work on the embedding model!