Spaces:

GIZ
/

embedding_visualisation

Running

peter2000 commited on Nov 7, 2022

Commit

402af37

1 Parent(s): aae2963

Update apps/similarity.py

Files changed (1) hide show

apps/similarity.py CHANGED Viewed

@@ -10,7 +10,7 @@ def app():
         st.write(
             """
             Information cartography - Get your word/phrase/sentence/paragraph embedded and visualized.
-            The (English) sentence-transformers model "all-MiniLM-L6-v2" maps sentences & paragraphs to a 384 dimensional dense vector space This is normally used for tasks like clustering or semantic search, but in this case, we use it to calculate the (cosine) similarity. The sentence transformer is context sensitive and works best with whole sentences, to account for that we extend your text with "The book is about <text>" if its less than 15 characters.
             Simply put in your text and press COMPARE, the higher the similarity the closer the text in the embedding space (max 1).
             """)
@@ -28,9 +28,9 @@ def app():
         with st.spinner("Embedding comparing  your inputs"):
             document = [word_to_embed1 ,word_to_embed2]
-            documents_embed = ["The book is about "+ wte for wte in document if len(wte) <15]
             #Encode paragraphs
-            document_embeddings = model.encode(documents_embed , show_progress_bar=False)
             #Compute cosine similarity between labels sentences and paragraphs
             similarity_matrix = cosine_similarity(document_embeddings)

         st.write(
             """
             Information cartography - Get your word/phrase/sentence/paragraph embedded and visualized.
+            The (English) sentence-transformers model "all-MiniLM-L6-v2" maps sentences & paragraphs to a 384 dimensional dense vector space This is normally used for tasks like clustering or semantic search, but in this case, we use it to calculate the (cosine) similarity. The sentence transformer is context sensitive and works best with whole sentences.
             Simply put in your text and press COMPARE, the higher the similarity the closer the text in the embedding space (max 1).
             """)
         with st.spinner("Embedding comparing  your inputs"):
             document = [word_to_embed1 ,word_to_embed2]
             #Encode paragraphs
+            document_embeddings = model.encode(document , show_progress_bar=False)
             #Compute cosine similarity between labels sentences and paragraphs
             similarity_matrix = cosine_similarity(document_embeddings)