Rabbitt-AI commited on
Commit
0edaf2b
·
verified ·
1 Parent(s): ed170db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -13
README.md CHANGED
@@ -11,7 +11,7 @@ sdk_version: 4.44.0
11
 
12
  ## **Overview:**
13
 
14
- ChanceRAG is a Retrieval-Augmented Generation (RAG) application designed to process documents (such as PDF and Docs) and retrieve relevant information to provide detailed and accurate responses based on user queries. The system leverages various retrieval techniques, including vector embeddings, annoy, BM25, and Word2Vec, and re-ranking methods advanced fusion. The application integrates with Mistral’s embedding model for generating embeddings and employs Annoy for efficient retrieval using angular distance.
15
 
16
  ## **Data Flow:**
17
 
@@ -27,11 +27,11 @@ ChanceRAG is a Retrieval-Augmented Generation (RAG) application designed to proc
27
 
28
  ### **2\. Query Handling and Retrieval:**
29
 
30
- * ### Upon receiving a query, the system creates embeddings for the query and employs various retrieval methods, including Annoy, TF-IDF, BM25, and Word2Vec, to fetch the most relevant document chunks.
31
 
32
  ### **3\. Re-ranking and Fusion:**
33
 
34
- * ### Retrieved document chunks are re-ranked using methods such as reciprocal rank fusion, advanced fusion retrieval, weighted score fusion, and semantic similarity.
35
 
36
  * ### The highest-ranked results are used to generate a final response.
37
 
@@ -39,7 +39,7 @@ ChanceRAG is a Retrieval-Augmented Generation (RAG) application designed to proc
39
 
40
  * ### Based on the retrieved context, ChanceRAG generates detailed, tailored responses using the Mistral AI API.
41
 
42
- * ### Users can customize the response style (e.g., Detailed, Concise, Creative, or Technical), retrieval methods, reranking strategies, chunk size, and overlap.
43
 
44
  ## **Components of the ChanceRAG System:**
45
 
@@ -61,23 +61,29 @@ ChanceRAG is a Retrieval-Augmented Generation (RAG) application designed to proc
61
  | Method | Speed | Accuracy | Memory Usage |
62
  | :---: | :---: | :---: | :---: |
63
  | Annoy | Fast | Good | Low |
64
- | TF-IDF | Fast | Moderate | Moderate |
65
  | BM25 | Fast | Good | Low |
66
  | Word2Vec | Slow | Good | High |
67
- | Euclidean | Fast | Moderate | Low |
68
- | Jaccard | Slow | Moderate | Low |
69
 
70
  Retrieval Methods Comparison
71
 
72
  ### **4\. Reranking Engine:**
73
 
74
- * Applies reranking methods like advanced fusion, reciprocal rank fusion, weighted score fusion, and semantic similarity to ensure the most relevant documents are prioritized.
75
- **Methods:**
76
- * Advanced Fusion
77
- * Reciprocal Rank Fusion
78
- * Weighted Score Fusion
79
- * Semantic Similarity Reranking
 
 
 
80
 
 
 
 
 
 
81
 
82
  ### **5\. Response Generator:**
83
 
 
11
 
12
  ## **Overview:**
13
 
14
+ ChanceRAG is a Retrieval-Augmented Generation (RAG) application designed to process documents (such as PDF and Docs) and retrieve relevant information to provide detailed and accurate responses based on user queries. The system leverages various retrieval techniques, including vector embeddings, annoy, BM25, and Word2Vec, and uses advanced fusion for reranking. The application integrates with Mistral’s embedding model for generating embeddings and employs Annoy for efficient retrieval using angular distance.
15
 
16
  ## **Data Flow:**
17
 
 
27
 
28
  ### **2\. Query Handling and Retrieval:**
29
 
30
+ * ### Upon receiving a query, the system creates embeddings for the query and employs various retrieval methods, including Annoy, BM25, and Word2Vec, to fetch the most relevant document chunks.
31
 
32
  ### **3\. Re-ranking and Fusion:**
33
 
34
+ * ### Retrieved document chunks are re-ranked using advanced fusion retrieval.
35
 
36
  * ### The highest-ranked results are used to generate a final response.
37
 
 
39
 
40
  * ### Based on the retrieved context, ChanceRAG generates detailed, tailored responses using the Mistral AI API.
41
 
42
+ * ### Users can customize the response style (e.g., Detailed, Concise, Creative, or Technical).
43
 
44
  ## **Components of the ChanceRAG System:**
45
 
 
61
  | Method | Speed | Accuracy | Memory Usage |
62
  | :---: | :---: | :---: | :---: |
63
  | Annoy | Fast | Good | Low |
 
64
  | BM25 | Fast | Good | Low |
65
  | Word2Vec | Slow | Good | High |
66
+
 
67
 
68
  Retrieval Methods Comparison
69
 
70
  ### **4\. Reranking Engine:**
71
 
72
+ * Applies advanced fusion reranking method to ensure the most relevant documents are prioritized.
73
+
74
+ * The advanced_fusion mechanism combines multiple retrieval methods (BM25 and Annoy) to rank documents more effectively.
75
+
76
+ * The method retrieves documents using different retrieval methods (Annoy for nearest neighbors and BM25 for traditional ranking).
77
+
78
+ * A similarity graph (sim_graph) is built by calculating cosine similarities between the embeddings of the documents. For any document pair where the cosine similarity is greater than 0.5, an edge is created in the graph with a weight equal to the similarity score. Then, the PageRank algorithm is applied to the similarity graph to score documents based on their relative importance in this network.
79
+
80
+ * The fusion process combines the vector scores (Annoy), BM25 scores, and PageRank scores into a single score. Each component is given a weight:
81
 
82
+ 0.5 for vector scores,
83
+ 0.3 for BM25 scores,
84
+ 0.2 for PageRank scores
85
+ * The documents are sorted based on the combined scores in descending order. The top 5 documents are returned with their text.
86
+
87
 
88
  ### **5\. Response Generator:**
89