Ashish08 commited on
Commit
19386b7
β€’
1 Parent(s): a385de1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -3
README.md CHANGED
@@ -1,13 +1,43 @@
1
  ---
2
  title: Multilingual Search Quora Similar Questions
3
- emoji: πŸš€
4
  colorFrom: indigo
5
  colorTo: red
6
  sdk: gradio
7
- sdk_version: 4.39.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Multilingual Search Quora Similar Questions
3
+ emoji: πŸ”πŸŒπŸ’¬
4
  colorFrom: indigo
5
  colorTo: red
6
  sdk: gradio
7
+ sdk_version: 4.41.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
  ---
12
 
13
+ # Semantic Search App for Quora Dataset
14
+
15
+ This application enables semantic search across the Quora question dataset in multiple languages using advanced machine learning techniques. Unlike traditional keyword search, semantic search considers the meaning of the query to generate more relevant results.
16
+
17
+ ## Features
18
+
19
+ - **Multilingual Semantic Search**: The app allows users to search for similar questions in different languages.
20
+ - **Model**: Uses the `paraphrase-multilingual-mpnet-base-v2` model from Sentence Transformers to generate embeddings for the queries and questions.
21
+ - **Vector Database**: Embeddings are stored and retrieved from the Pinecone Vector Database, allowing for efficient similarity searches.
22
+ - **Cosine Similarity**: Search results are ranked by cosine similarity scores, from highest to lowest, showing how closely related each question is to the query.
23
+ - **Dynamic Query**: Users can adjust the number of similar questions retrieved using a slider.
24
+
25
+ ## How It Works
26
+
27
+ 1. **Embedding Generation**: The app uses the `paraphrase-multilingual-mpnet-base-v2` model to encode both the query and the questions from the Quora dataset into 768-dimensional embeddings.
28
+ 2. **Search Query**: When a user inputs a search query, the app generates an embedding for the query.
29
+ 3. **Similarity Search**: The query embedding is then compared with the stored question embeddings in Pinecone using cosine similarity. The top K most similar questions are retrieved and displayed.
30
+ 4. **Results Display**: The results are shown in a table, with each row displaying the question ID, the question text, and the similarity score.
31
+
32
+ ## Usage
33
+
34
+ 1. **Input Your Query**: Enter your search query in the text box provided.
35
+ 2. **Adjust Number of Results**: Use the slider to select how many similar questions you want to retrieve (between 3 and 10).
36
+ 3. **View Results**: After clicking the "Search" button, the app will display the most similar questions along with their similarity scores in a table.
37
+
38
+ ## Technology Stack
39
+
40
+ - **Gradio**: Used to build the interactive user interface.
41
+ - **Pinecone**: Vector database for storing and querying embeddings.
42
+ - **Sentence Transformers**: Used for generating embeddings with the `paraphrase-multilingual-mpnet-base-v2` model.
43
+ - **Pandas**: Used for handling and displaying the results in a tabular format.