Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,67 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- fr
|
6 |
+
- de
|
7 |
+
- es
|
8 |
+
- ru
|
9 |
+
base_model:
|
10 |
+
- OrdalieTech/Solon-embeddings-large-0.1
|
11 |
+
---
|
12 |
+
## News
|
13 |
+
11/12/2024: Release of Algolia/Algolia-large-multilang-generic-v2410, Algolia's multilingual embedding model.
|
14 |
+
|
15 |
+
## Models
|
16 |
+
Algolia-large-multilang-generic-v2410 is the first addition to Algolia's suite of multilingual embedding models built for retrieval performance and efficiency in e-commerce search.
|
17 |
+
Algolia v2410 models are the state-of-the-art for their size and use cases and now available under an MIT licence.
|
18 |
+
|
19 |
+
### Quality Benchmarks
|
20 |
+
|Model|MTEB EN rank|Public e-comm rank| Algolia private e-comm rank|
|
21 |
+
|Algolia-large-multilang-generic-v2410|21|12|5|
|
22 |
+
|
23 |
+
Note that our benchmarks are for retrieval task only, and includes open-source models that are approximately 500M parameters and smaller, and commercially available embedding models.
|
24 |
+
|
25 |
+
## Usage
|
26 |
+
|
27 |
+
### Using Sentence Transformers
|
28 |
+
```python
|
29 |
+
# Load model and tokenizer
|
30 |
+
from scipy.spatial.distance import cosine
|
31 |
+
from sentence_transformers import SentenceTransformer
|
32 |
+
modelname = "algolia/algolia-large-multilang-generic-v2410"
|
33 |
+
model = SentenceTransformer(modelname)
|
34 |
+
|
35 |
+
# Define embedding and compute_similarity
|
36 |
+
def get_embedding(text):
|
37 |
+
embedding = model.encode([text])
|
38 |
+
return embedding[0]
|
39 |
+
def compute_similarity(query, documents):
|
40 |
+
query_emb = get_embedding(query)
|
41 |
+
doc_embeddings = [get_embedding(doc) for doc in documents]
|
42 |
+
# Calculate cosine similarity
|
43 |
+
similarities = [1 - cosine(query_emb, doc_emb) for doc_emb in doc_embeddings]
|
44 |
+
ranked_docs = sorted(zip(documents, similarities), key=lambda x: x[1], reverse=True)
|
45 |
+
# Format output
|
46 |
+
return [{"document": doc, "similarity_score": round(sim, 4)} for doc, sim in ranked_docs]
|
47 |
+
|
48 |
+
# Define inputs
|
49 |
+
query = "query: "+"running shoes"
|
50 |
+
documents = ["adidas sneakers, great for outdoor running",
|
51 |
+
"nike soccer boots indoor, it can be used on turf",
|
52 |
+
"new balance light weight, good for jogging",
|
53 |
+
"hiking boots, good for bushwalking"
|
54 |
+
]
|
55 |
+
|
56 |
+
# Output the results
|
57 |
+
result_df = pd.DataFrame(compute_similarity(query,documents))
|
58 |
+
print(query)
|
59 |
+
result_df.head()
|
60 |
+
```
|
61 |
+
|
62 |
+
## Contact
|
63 |
+
Feel free to open an issue or pull request if you have any questions or suggestions about this project.
|
64 |
+
You also can email Rasit Abay(rasit.abay@algolia.com).
|
65 |
+
|
66 |
+
## License
|
67 |
+
Algolia EN v2410 is licensed under the [MIT](https://mit-license.org/). The released models can be used for commercial purposes free of charge.
|