Spaces:

Rabbitt-AI
/

ChanceRAG

Running

App Files Files Community

Rabbitt-AI commited on Oct 7, 2024

Commit

0edaf2b

verified ·

1 Parent(s): ed170db

Update README.md

Browse files

Files changed (1) hide show

README.md +19 -13

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ sdk_version: 4.44.0
 ## **Overview:**
-ChanceRAG is a Retrieval-Augmented Generation (RAG) application designed to process documents (such as PDF and Docs) and retrieve relevant information to provide detailed and accurate responses based on user queries. The system leverages various retrieval techniques, including vector embeddings, annoy, BM25, and Word2Vec, and re-ranking methods advanced fusion. The application integrates with Mistral’s embedding model for generating embeddings and employs Annoy for efficient retrieval using angular distance.
 ## **Data Flow:**
@@ -27,11 +27,11 @@ ChanceRAG is a Retrieval-Augmented Generation (RAG) application designed to proc
 ### **2\. Query Handling and Retrieval:**
-* ### Upon receiving a query, the system creates embeddings for the query and employs various retrieval methods, including Annoy, TF-IDF, BM25, and Word2Vec, to fetch the most relevant document chunks.
 ### **3\. Re-ranking and Fusion:**
-* ### Retrieved document chunks are re-ranked using methods such as reciprocal rank fusion, advanced fusion retrieval, weighted score fusion, and semantic similarity.
 * ### The highest-ranked results are used to generate a final response.
@@ -39,7 +39,7 @@ ChanceRAG is a Retrieval-Augmented Generation (RAG) application designed to proc
 * ### Based on the retrieved context, ChanceRAG generates detailed, tailored responses using the Mistral AI API.
-* ### Users can customize the response style (e.g., Detailed, Concise, Creative, or Technical), retrieval methods, reranking strategies, chunk size, and overlap.
 ## **Components of the ChanceRAG System:**
@@ -61,23 +61,29 @@ ChanceRAG is a Retrieval-Augmented Generation (RAG) application designed to proc
 | Method | Speed | Accuracy | Memory Usage |
 | :---: | :---: | :---: | :---: |
 | Annoy | Fast | Good | Low |
-| TF-IDF | Fast | Moderate | Moderate |
 | BM25 | Fast | Good | Low |
 | Word2Vec | Slow | Good | High |
-| Euclidean | Fast | Moderate | Low |
-| Jaccard | Slow | Moderate | Low |
  Retrieval Methods Comparison
 ### **4\. Reranking Engine:**
-* Applies reranking methods like advanced fusion, reciprocal rank fusion, weighted score fusion, and semantic similarity to ensure the most relevant documents are prioritized.
-  **Methods:**
-* Advanced Fusion
-* Reciprocal Rank Fusion
-* Weighted Score Fusion
-* Semantic Similarity Reranking
 ### **5\. Response Generator:**

 ## **Overview:**
+ChanceRAG is a Retrieval-Augmented Generation (RAG) application designed to process documents (such as PDF and Docs) and retrieve relevant information to provide detailed and accurate responses based on user queries. The system leverages various retrieval techniques, including vector embeddings, annoy, BM25, and Word2Vec, and uses advanced fusion for reranking. The application integrates with Mistral’s embedding model for generating embeddings and employs Annoy for efficient retrieval using angular distance.
 ## **Data Flow:**
 ### **2\. Query Handling and Retrieval:**
+* ### Upon receiving a query, the system creates embeddings for the query and employs various retrieval methods, including Annoy, BM25, and Word2Vec, to fetch the most relevant document chunks.
 ### **3\. Re-ranking and Fusion:**
+* ### Retrieved document chunks are re-ranked using advanced fusion retrieval.
 * ### The highest-ranked results are used to generate a final response.
 * ### Based on the retrieved context, ChanceRAG generates detailed, tailored responses using the Mistral AI API.
+* ### Users can customize the response style (e.g., Detailed, Concise, Creative, or Technical).
 ## **Components of the ChanceRAG System:**
 | Method | Speed | Accuracy | Memory Usage |
 | :---: | :---: | :---: | :---: |
 | Annoy | Fast | Good | Low |
 | BM25 | Fast | Good | Low |
 | Word2Vec | Slow | Good | High |
  Retrieval Methods Comparison
 ### **4\. Reranking Engine:**
+* Applies advanced fusion reranking method to ensure the most relevant documents are prioritized.
+* The advanced_fusion mechanism combines multiple retrieval methods (BM25 and Annoy) to rank documents more effectively.
+* The method retrieves documents using different retrieval methods (Annoy for nearest neighbors and BM25 for traditional ranking).
+*  A similarity graph (sim_graph) is built by calculating cosine similarities between the embeddings of the documents. For any document pair where the cosine similarity is greater than 0.5, an edge is created in the graph with a weight equal to the similarity score. Then, the PageRank algorithm is applied to the similarity graph to score documents based on their relative importance in this network.
+* The fusion process combines the vector scores (Annoy), BM25 scores, and PageRank scores into a single score. Each component is given a weight:
+    0.5 for vector scores,
+    0.3 for BM25 scores,
+    0.2 for PageRank scores
+* The documents are sorted based on the combined scores in descending order. The top 5 documents are returned with their text.
 ### **5\. Response Generator:**