News

11/12/2024: Release of Algolia/Algolia-large-multilang-generic-v2410, Algolia's multilingual embedding model.

Models

Algolia-large-multilang-generic-v2410 is the first addition to Algolia's suite of multilingual embedding models built for retrieval performance and efficiency in e-commerce search. Algolia v2410 models are the state-of-the-art for their size and use cases and now available under an MIT licence.

Note that generic models are trained on public and synthetic e-commerce datasets only.

Usage

Add "query: " before the query. No instructions needed for documents.

Using Sentence Transformers

# Load model and tokenizer
from scipy.spatial.distance import cosine
from sentence_transformers import SentenceTransformer
modelname = "algolia/algolia-large-multilang-generic-v2410"
model = SentenceTransformer(modelname)

# Define embedding and compute_similarity
def get_embedding(text):
    embedding = model.encode([text])
    return embedding[0]
def compute_similarity(query, documents):
    query_emb = get_embedding(query)
    doc_embeddings = [get_embedding(doc) for doc in documents]
    # Calculate cosine similarity
    similarities = [1 - cosine(query_emb, doc_emb) for doc_emb in doc_embeddings]
    ranked_docs = sorted(zip(documents, similarities), key=lambda x: x[1], reverse=True)
    # Format output
    return [{"document": doc, "similarity_score": round(sim, 4)} for doc, sim in ranked_docs]

# Define inputs
query = "query: "+"running shoes"
documents = ["adidas sneakers, great for outdoor running",
             "nike soccer boots indoor, it can be used on turf",
             "new balance light weight, good for jogging",
             "hiking boots, good for bushwalking"
            ]

# Output the results
result_df = pd.DataFrame(compute_similarity(query,documents))
print(query)
result_df.head()

Contact

Feel free to open an issue or pull request if you have any questions or suggestions about this project. You also can email Algolia AI research team (ai-research@algolia.com).

License

Algolia Multilang v2410 is licensed under the MIT. The released models can be used for commercial purposes free of charge.

Downloads last month
22
Safetensors
Model size
560M params
Tensor type
F32
ยท
Inference API
Unable to determine this model's library. Check the docs .

Model tree for algolia/algolia-large-multilang-generic-v2410

Quantized
(1)
this model

Space using algolia/algolia-large-multilang-generic-v2410 1