mixedbread-ai/mxbai-embed-2d-large-v1

The crispy sentence embedding family from Mixedbread.

^{🍞 Looking for a simple end-to-end retrieval solution? Meet Omni, our multimodal and multilingual model. Get in touch for access.}

🪆mxbai-embed-2d-large-v1🪆

This is our 2DMSE sentence embedding model. It supports the adaptive transformer layer and embedding size. Find out more in our blog post.

TLDR: 2D-🪆 allows you to shrink the model and the embeddings layer. Shrinking only the embeddings model yields competetive results to other models like nomics embeddings model. Shrinking the model to ~50% maintains upto 85% of the performance without further training.

Quickstart

Here, we provide several ways to produce sentence embeddings with adaptive layers and embedding sizes. For this version, it is recommended to set adaptive layers from 20 to 24.

sentence-transformers

Currently, the best way to use our models is with the most recent version of sentence-transformers.

python -m pip install -U sentence-transformers

from sentence_transformers import models, SentenceTransformer from sentence_transformers.util import cos_sim # 1. load model with `cls` pooling model = SentenceTransformer("mixedbread-ai/mxbai-embed-2d-large-v1") # 2. set adaptive layer and embedding size. # it is recommended to set layers from 20 to 24. new_num_layers = 22 # 1D: set layer size model[0].auto_model.encoder.layer = model[0].auto_model.encoder.layer[:new_num_layers] new_embedding_size = 768 # 2D: set embedding size # 3. encode embeddings = model.encode( [ 'Who is german and likes bread?', 'Everybody in Germany.' ] ) # Similarity of the first sentence with the other two similarities = cos_sim(embeddings[0, :new_embedding_size], embeddings[1, :new_embedding_size]) print('similarities:', similarities)

angle-emb

You can also use the lastest angle-emb for inference, as follows:

python -m pip install -U angle-emb

from angle_emb import AnglE from sentence_transformers.util import cos_sim # 1. load model model = AnglE.from_pretrained("mixedbread-ai/mxbai-embed-2d-large-v1", pooling_strategy='cls').cuda() # 2. set adaptive layer and embedding size. # it is recommended to set layers from 20 to 24. layer_index = 22 # 1d: layer embedding_size = 768 # 2d: embedding size # 3. encode embeddings = model.encode([ 'Who is german and likes bread?', 'Everybody in Germany.' ], layer_index=layer_index, embedding_size=embedding_size) similarities = cos_sim(embeddings[0], embeddings[1:]) print('similarities:', similarities)

Transformers.js

If you haven't already, you can install the Transformers.js JavaScript library from NPM using:

npm i @xenova/transformers

You can then use the model to compute embeddings as follows:

import { pipeline, cos_sim } from '@xenova/transformers'; // Create a feature-extraction pipeline const extractor = await pipeline('feature-extraction', 'mixedbread-ai/mxbai-embed-2d-large-v1', { quantized: false, // (Optional) remove this line to use the 8-bit quantized model }); // Compute sentence embeddings (with `cls` pooling) const sentences = ['Who is german and likes bread?', 'Everybody in Germany.' ]; const output = await extractor(sentences, { pooling: 'cls' }); // Set embedding size and truncate embeddings const new_embedding_size = 768; const truncated = output.slice(null, [0, new_embedding_size]); // Compute cosine similarity console.log(cos_sim(truncated[0].data, truncated[1].data)); // 0.6979532021425204

Using API

You can use the model via our API as follows:

from mixedbread_ai.client import MixedbreadAI from sklearn.metrics.pairwise import cosine_similarity import os mxbai = MixedbreadAI(api_key="{MIXEDBREAD_API_KEY}") english_sentences = [ 'What is the capital of Australia?', 'Canberra is the capital of Australia.' ] res = mxbai.embeddings( input=english_sentences, model="mixedbread-ai/mxbai-embed-2d-large-v1", dimensions=512, ) embeddings = [entry.embedding for entry in res.data] similarities = cosine_similarity([embeddings[0]], [embeddings[1]]) print(similarities)

The API comes with native INT8 and binary quantization support! Check out the docs for more information.

Evaluation

Please find more information in our blog post.

Community

Please join our Discord Community and share your feedback and thoughts! We are here to help and also always happy to chat.

License

Apache 2.0

mixedbread-ai
/

mxbai-embed-2d-large-v1

🪆mxbai-embed-2d-large-v1🪆

Quickstart

sentence-transformers

angle-emb

Transformers.js

Using API

Evaluation

Community

License

Model tree for mixedbread-ai/mxbai-embed-2d-large-v1

Spaces using mixedbread-ai/mxbai-embed-2d-large-v1 5

Collection including mixedbread-ai/mxbai-embed-2d-large-v1

em🍞ing series

Evaluation results