ksvmuralidhar commited on
Commit
4cb94f8
·
verified ·
1 Parent(s): f2f1bf9

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +5 -5
app.py CHANGED
@@ -43,18 +43,18 @@ def find_similar_news(text: str, top_n: int=5):
43
 
44
  vectorizer = TextVectorizer()
45
  collection = get_milvus_collection()
46
- sent_model = SentenceTransformer('multi-qa-distilbert-cos-v1')
47
 
48
  def main():
49
 
50
  # st.title("Find Similar News")
51
- st.markdown("<h3>Find Similar News With Sentence Transformers</h3>", unsafe_allow_html=True)
52
  desc = '''<p style="font-size: 13px;">
53
  Embeddings of 300,000 news headlines are stored in Milvus vector database, used as a feature store.
54
- Embeddings of the input headline are computed using sentence transformers (multi-qa-distilbert-cos-v1).
55
  Similar news headlines are retrieved from the vector database using Euclidean distance as similarity metric.
56
- <span style="color: red;">This method is found to be more accurate and faster (in terms of inserting embeddings into vector DB) compared to the method of extracting embeddings
57
- from fine-tuned classification model, discussed </span><a href="https://huggingface.co/spaces/ksvmuralidhar/vector-db-search" target="_blank">here.</a>
58
  </p>
59
  '''
60
  st.markdown(desc, unsafe_allow_html=True)
 
43
 
44
  vectorizer = TextVectorizer()
45
  collection = get_milvus_collection()
46
+ sent_model = SentenceTransformer('all-mpnet-base-v2')
47
 
48
  def main():
49
 
50
  # st.title("Find Similar News")
51
+ st.markdown("<h3>Find Similar News With Sentence Transformers (all-mpnet-base-v2)</h3>", unsafe_allow_html=True)
52
  desc = '''<p style="font-size: 13px;">
53
  Embeddings of 300,000 news headlines are stored in Milvus vector database, used as a feature store.
54
+ Embeddings of the input headline are computed using sentence transformers (all-mpnet-base-v2).
55
  Similar news headlines are retrieved from the vector database using Euclidean distance as similarity metric.
56
+ <span style="color: red;">This method (all-mpnet-base-v2) has the best performance compared to multi-qa-distilbert-cos-v1 fine-tuned using TSDAE
57
+ and extracting embeddings from fine-tuned DistilBERT classifier.</span>
58
  </p>
59
  '''
60
  st.markdown(desc, unsafe_allow_html=True)