@jinaai, we've recently launched an interesting model: jinaai/jina-colbert-v1-en. In this post, I'd like to give you a quick introduction to ColBERT: the multi-vector search & late interaction retriever.
As you may already know, we've been developing embedding models such as jinaai/jina-embeddings-v2-base-en for some time. These models, often called 'dense retrievers', generate a single representation for each document.
Embedding models like Jina-v2 have the advantage of quick integration with vector databases and good performance within a specific domain.
When discussing tasks within a specific domain, it means embedding models can perform very well by "seeing similar distributions". However, this also suggests that they might only perform "okay" on tasks outside of that domain and require fine-tuning.
Now, let's delve into multi-vector search and late-interaction models. The idea is quite simple:
1. During model training, you apply dimensionality reduction to decrease the vector dimensionality from 768 to 128 to save storage. 2. Now, with one query and one document, you match each query token embedding against every token embedding in the document to find the maximum similarity score. Repeat this process for each token in the query, from the second to the last, and then sum up all the maximum similarity scores.
This process is called multi-vector search because if your query has 5 tokens, you're keeping 5 * 128 token embeddings. The "max similarity sum-up" procedure is termed late interaction.
Multi-vector & Late interaction retrievers have the advantage of:
1. Excellent performance outside of a specific domain since they match at a token-level granularity. 2. Explainability: you can interpret your token-level matching and understand why the score is higher/lower.