rabay35 commited on
Commit
324da95
1 Parent(s): a24c91c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -3
README.md CHANGED
@@ -1,3 +1,67 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ - fr
6
+ - de
7
+ - es
8
+ - ru
9
+ base_model:
10
+ - OrdalieTech/Solon-embeddings-large-0.1
11
+ ---
12
+ ## News
13
+ 11/12/2024: Release of Algolia/Algolia-large-multilang-generic-v2410, Algolia's multilingual embedding model.
14
+
15
+ ## Models
16
+ Algolia-large-multilang-generic-v2410 is the first addition to Algolia's suite of multilingual embedding models built for retrieval performance and efficiency in e-commerce search.
17
+ Algolia v2410 models are the state-of-the-art for their size and use cases and now available under an MIT licence.
18
+
19
+ ### Quality Benchmarks
20
+ |Model|MTEB EN rank|Public e-comm rank| Algolia private e-comm rank|
21
+ |Algolia-large-multilang-generic-v2410|21|12|5|
22
+
23
+ Note that our benchmarks are for retrieval task only, and includes open-source models that are approximately 500M parameters and smaller, and commercially available embedding models.
24
+
25
+ ## Usage
26
+
27
+ ### Using Sentence Transformers
28
+ ```python
29
+ # Load model and tokenizer
30
+ from scipy.spatial.distance import cosine
31
+ from sentence_transformers import SentenceTransformer
32
+ modelname = "algolia/algolia-large-multilang-generic-v2410"
33
+ model = SentenceTransformer(modelname)
34
+
35
+ # Define embedding and compute_similarity
36
+ def get_embedding(text):
37
+ embedding = model.encode([text])
38
+ return embedding[0]
39
+ def compute_similarity(query, documents):
40
+ query_emb = get_embedding(query)
41
+ doc_embeddings = [get_embedding(doc) for doc in documents]
42
+ # Calculate cosine similarity
43
+ similarities = [1 - cosine(query_emb, doc_emb) for doc_emb in doc_embeddings]
44
+ ranked_docs = sorted(zip(documents, similarities), key=lambda x: x[1], reverse=True)
45
+ # Format output
46
+ return [{"document": doc, "similarity_score": round(sim, 4)} for doc, sim in ranked_docs]
47
+
48
+ # Define inputs
49
+ query = "query: "+"running shoes"
50
+ documents = ["adidas sneakers, great for outdoor running",
51
+ "nike soccer boots indoor, it can be used on turf",
52
+ "new balance light weight, good for jogging",
53
+ "hiking boots, good for bushwalking"
54
+ ]
55
+
56
+ # Output the results
57
+ result_df = pd.DataFrame(compute_similarity(query,documents))
58
+ print(query)
59
+ result_df.head()
60
+ ```
61
+
62
+ ## Contact
63
+ Feel free to open an issue or pull request if you have any questions or suggestions about this project.
64
+ You also can email Rasit Abay(rasit.abay@algolia.com).
65
+
66
+ ## License
67
+ Algolia EN v2410 is licensed under the [MIT](https://mit-license.org/). The released models can be used for commercial purposes free of charge.