sap-ai-research
/

miCSE

Sentence Similarity

feature-extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

TJKlein commited on Nov 25, 2022

Commit

c83ff03

•

1 Parent(s): e9ec569

Update README.md

Files changed (1) hide show

README.md +37 -3

README.md CHANGED Viewed

@@ -68,7 +68,6 @@ cos_sim = sim(embeddings.unsqueeze(1),
              embeddings.unsqueeze(0))
 print(f"Distance: {cos_sim[0,1].detach().item()}")
 ```
 ## Example 2) - Clustering
@@ -144,11 +143,47 @@ umap_model.fit(embeddings)
 # Plot result
 umap_plot.points(umap_model, labels = np.array(classes),theme='fire')
 ```
 ![UMAP Cluster](https://raw.githubusercontent.com/TJKlein/tjklein.github.io/master/images/miCSE_UMAP_small2.png)
 # Benchmark
 Model results on SentEval Benchmark:
@@ -160,7 +195,6 @@ Model results on SentEval Benchmark:
 +-------+-------+-------+-------+-------+--------------+-----------------+--------+
 ```
 ## Citations
 If you use this code in your research or want to refer to our work, please cite:

              embeddings.unsqueeze(0))
 print(f"Distance: {cos_sim[0,1].detach().item()}")
 ```
 ## Example 2) - Clustering
 # Plot result
 umap_plot.points(umap_model, labels = np.array(classes),theme='fire')
 ```
 ![UMAP Cluster](https://raw.githubusercontent.com/TJKlein/tjklein.github.io/master/images/miCSE_UMAP_small2.png)
+## Example 3) - Using [SentenceTransformers](https://www.sbert.net/)
+```python
+from sentence_transformers import SentenceTransformer, util
+from sentence_transformers import models
+import torch.nn as nn
+# Using the model with [CLS] embeddings
+model_name = 'sap-ai-research/miCSE'
+word_embedding_model = models.Transformer(model_name, max_seq_length=32)
+pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension())
+model = SentenceTransformer(modules=[word_embedding_model, pooling_model])
+# Using cosine similarity as metric
+cos_sim = nn.CosineSimilarity(dim=-1)
+# List of sentences for comparison
+sentences_1 = ["This is a sentence for testing miCSE.",
+    "This is using mutual information Contrastive Sentence Embeddings model."]
+sentences_2 = ["This is testing miCSE.",
+    "Similarity with miCSE"]
+# Compute embedding for both lists
+embeddings_1 = model.encode(sentences_1, convert_to_tensor=True)
+embeddings_2 = model.encode(sentences_2, convert_to_tensor=True)
+# Compute cosine similarities
+cosine_sim_scores = cos_sim(embeddings_1, embeddings_2)
+#Output of results
+for i in range(len(sentences1)):
+    print(f"Similarity {cosine_scores[i][i]:.2f}: {sentences1[i]} << vs. >> {sentences2[i]}")
+```
 # Benchmark
 Model results on SentEval Benchmark:
 +-------+-------+-------+-------+-------+--------------+-----------------+--------+
 ```
 ## Citations
 If you use this code in your research or want to refer to our work, please cite: