Cohere embed-multilingual-v3.0

This repository contains the tokenizer for the Cohere embed-multilingual-v3.0 model. See our blogpost Cohere Embed V3 for more details on this model.

You can use the embedding model either via the Cohere API, AWS SageMaker or in your private deployments.

Usage Cohere API

The following code snippet shows the usage of the Cohere API. Install the cohere SDK via:

pip install -U cohere

Get your free API key on: www.cohere.com

# This snippet shows and example how to use the Cohere Embed V3 models for semantic search.
# Make sure to have the Cohere SDK in at least v4.30 install: pip install -U cohere 
# Get your API key from: www.cohere.com
import cohere
import numpy as np

cohere_key = "{YOUR_COHERE_API_KEY}"   #Get your API key from www.cohere.com
co = cohere.Client(cohere_key)

docs = ["The capital of France is Paris",
        "PyTorch is a machine learning framework based on the Torch library.",
        "The average cat lifespan is between 13-17 years"]


#Encode your documents with input type 'search_document'
doc_emb = co.embed(docs, input_type="search_document", model="embed-multilingual-v3.0").embeddings
doc_emb = np.asarray(doc_emb)


#Encode your query with input type 'search_query'
query = "What is Pytorch"
query_emb = co.embed([query], input_type="search_query", model="embed-multilingual-v3.0").embeddings
query_emb = np.asarray(query_emb)
query_emb.shape

#Compute the dot product between query embedding and document embedding
scores = np.dot(query_emb, doc_emb.T)[0]

#Find the highest scores
max_idx = np.argsort(-scores)

print(f"Query: {query}")
for idx in max_idx:
  print(f"Score: {scores[idx]:.2f}")
  print(docs[idx])
  print("--------")

Usage AWS SageMaker

The embedding model can be privately deployed in your AWS Cloud using our AWS SageMaker marketplace offering. It runs privately in your VPC, with latencies as low as 5ms for query encoding.

Usage AWS Bedrock

Soon the model will also be available via AWS Bedrock. Stay tuned

Private Deployment

You want to run the model on your own hardware? Contact Sales to learn more.

Supported Languages

This model was trained on nearly 1B English training pairs and nearly 0.5B Non-English training pairs from 100+ languages.

Evaluation results can be found in the Embed V3.0 Benchmark Results spreadsheet.

Cohere
/

Cohere-embed-multilingual-v3.0