similar-news-sentence-transformer1

Sleeping

ksvmuralidhar commited on Jan 13, 2024

Commit

4cb94f8

verified ·

1 Parent(s): f2f1bf9

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -43,18 +43,18 @@ def find_similar_news(text: str, top_n: int=5):
 vectorizer = TextVectorizer()
 collection = get_milvus_collection()
-sent_model = SentenceTransformer('multi-qa-distilbert-cos-v1')
 def main():
     # st.title("Find Similar News")
-    st.markdown("<h3>Find Similar News With Sentence Transformers</h3>", unsafe_allow_html=True)
     desc = '''<p style="font-size: 13px;">
     Embeddings of 300,000 news headlines are stored in Milvus vector database, used as a feature store.
-    Embeddings of the input headline are computed using sentence transformers (multi-qa-distilbert-cos-v1).
     Similar news headlines are retrieved from the vector database using Euclidean distance as similarity metric.
-    <span style="color: red;">This method is found to be more accurate and faster (in terms of inserting embeddings into vector DB) compared to the method of extracting embeddings
-    from fine-tuned classification model, discussed </span><a href="https://huggingface.co/spaces/ksvmuralidhar/vector-db-search" target="_blank">here.</a>
     </p>
     '''
     st.markdown(desc, unsafe_allow_html=True)

 vectorizer = TextVectorizer()
 collection = get_milvus_collection()
+sent_model = SentenceTransformer('all-mpnet-base-v2')
 def main():
     # st.title("Find Similar News")
+    st.markdown("<h3>Find Similar News With Sentence Transformers (all-mpnet-base-v2)</h3>", unsafe_allow_html=True)
     desc = '''<p style="font-size: 13px;">
     Embeddings of 300,000 news headlines are stored in Milvus vector database, used as a feature store.
+    Embeddings of the input headline are computed using sentence transformers (all-mpnet-base-v2).
     Similar news headlines are retrieved from the vector database using Euclidean distance as similarity metric.
+    <span style="color: red;">This method (all-mpnet-base-v2) has the best performance compared to multi-qa-distilbert-cos-v1 fine-tuned using TSDAE
+    and extracting embeddings from fine-tuned DistilBERT classifier.</span>
     </p>
     '''
     st.markdown(desc, unsafe_allow_html=True)