JarvisChan630 commited on
Commit
41b582a
·
1 Parent(s): 67b3290
README.md CHANGED
@@ -1,46 +1,47 @@
1
- # Meta Expert
2
 
3
  A project for versatile AI agents that can run with proprietary models or completely open-source. The meta expert has two agents: a basic [Meta Agent](Docs/Meta-Prompting%20Overview.MD), and [Jar3d](Docs/Introduction%20to%20Jar3d.MD), a more sophisticated and versatile agent.
4
 
5
- Act as an opne source perplexity.
6
 
7
  Thanks John Adeojo, who brings this wonderful project to open source community!
8
 
9
- ## PMF - What problem this project has solved?
 
 
 
 
 
 
 
 
 
 
 
10
 
11
- ## Technical Detail
12
- What is the logics?
13
 
14
- LLM Application Workflow
 
15
  1. User Query: The user initiates the interaction by submitting a query or request for information.
16
  2. Agent Accesses the Internet: The agent retrieves relevant information from various online sources, such as web pages, articles, and databases.
17
- 3. Document Chunking: The retrieved URLs are processed to break down the content into smaller, manageable documents or chunks. This step ensures that the information is more digestible and can be analyzed effectively.
18
  4. Vectorization: Each document chunk is then transformed into a multi-dimensional embedding using vectorization techniques. This process captures the semantic meaning of the text, allowing for nuanced comparisons between different pieces of information.
19
  5. Similarity Search: A similarity search is performed using cosine similarity (or another appropriate metric) to identify and rank the most relevant document chunks in relation to the original user query. This step helps in finding the closest matches based on the embeddings generated earlier.
20
  6. Response Generation: Finally, the most relevant chunks are selected, and the LLM synthesizes them into a coherent response that directly addresses the user's query.
21
 
22
  ## Bullet points
23
- - By implemented RAG, Chain-of-Reasoning, and Meta-Prompting to complete long-running research tasks.
24
-
25
- - Neo4j Knowledge Graphs
26
- -Why use this?
27
- naive RAG:
28
- ![naive](image.png)
29
- Complex:
30
- ![why need graph](assets/image.png)
31
-
32
 
33
- - Docker for backend
34
-
35
- - NLM-Ingestor - llmsherpa API - Chunk data
36
 
 
 
37
 
 
38
 
39
 
40
- ## FAQ
41
- 1. Is it necessary for a recursion more than 30 rounds? Is it spending money too much?
42
-
43
 
 
 
44
 
45
  ## Table of Contents
46
 
@@ -115,6 +116,7 @@ This project leverages four core concepts:
115
  nano config/config.yaml
116
  ```
117
 
 
118
  ### API Key Configuration
119
 
120
  Enter API Keys for your choice of LLM provider:
@@ -222,9 +224,3 @@ Refer to the project's GitHub issues for common problems and solutions.
222
 
223
  Once you're set up, Jar3d will proceed to introduce itself and ask some questions. The questions are designed to help you refine your requirements. When you feel you have provided all the relevant information to Jar3d, you can end the questioning part of the workflow by typing `/end`.
224
 
225
- ## Roadmap for Jar3d
226
-
227
- - Feedback to Jar3d so that final responses can be iterated on and amended.
228
- - Long-term memory.
229
- - Full Ollama and vLLM integration.
230
- - Integrations to RAG platforms for more intelligent document processing and faster RAG.
 
1
+ # Super Expert
2
 
3
  A project for versatile AI agents that can run with proprietary models or completely open-source. The meta expert has two agents: a basic [Meta Agent](Docs/Meta-Prompting%20Overview.MD), and [Jar3d](Docs/Introduction%20to%20Jar3d.MD), a more sophisticated and versatile agent.
4
 
5
+ Act as an open source perplexity.
6
 
7
  Thanks John Adeojo, who brings this wonderful project to open source community!
8
 
9
+ ## Tech Stack
10
+ - LLM(openai, claude, llama)
11
+ - Frontend(Chainlit - chain of thought reasoning)
12
+ - Backend
13
+ - python
14
+ - docker
15
+ - Hugging Face deploy
16
+
17
+ ## TODO
18
+ [] Long-term memory.
19
+ [] Full Ollama and vLLM integration.
20
+ [] Integrations to RAG platforms for more intelligent document processing and faster RAG.
21
 
22
+ ## PMF - What problem this project has solved?
 
23
 
24
+ ## Business Logics
25
+ ### LLM Application Workflow
26
  1. User Query: The user initiates the interaction by submitting a query or request for information.
27
  2. Agent Accesses the Internet: The agent retrieves relevant information from various online sources, such as web pages, articles, and databases.
28
+ 3. Document Chunking: The retrieved URLs are processed to break down the content into smaller, manageable documents or chunks. This step ensures that the information is more digestible and can be analyzed effectively.(tools\legacy\offline_graph_rag_tool copy.py run_rag)
29
  4. Vectorization: Each document chunk is then transformed into a multi-dimensional embedding using vectorization techniques. This process captures the semantic meaning of the text, allowing for nuanced comparisons between different pieces of information.
30
  5. Similarity Search: A similarity search is performed using cosine similarity (or another appropriate metric) to identify and rank the most relevant document chunks in relation to the original user query. This step helps in finding the closest matches based on the embeddings generated earlier.
31
  6. Response Generation: Finally, the most relevant chunks are selected, and the LLM synthesizes them into a coherent response that directly addresses the user's query.
32
 
33
  ## Bullet points
 
 
 
 
 
 
 
 
 
34
 
 
 
 
35
 
36
+ ## FAQ
37
+ 1. How this system work?
38
 
39
+ 2.
40
 
41
 
 
 
 
42
 
43
+ 2. How hybrid-retrieval work?
44
+ In `offline_graph_rag` file, we combine similarity search with
45
 
46
  ## Table of Contents
47
 
 
116
  nano config/config.yaml
117
  ```
118
 
119
+ If you want to use hyhbrid search, please open settings and choose "Graph and Dense".
120
  ### API Key Configuration
121
 
122
  Enter API Keys for your choice of LLM provider:
 
224
 
225
  Once you're set up, Jar3d will proceed to introduce itself and ask some questions. The questions are designed to help you refine your requirements. When you feel you have provided all the relevant information to Jar3d, you can end the questioning part of the workflow by typing `/end`.
226
 
 
 
 
 
 
 
tools/legacy/offline_graph_rag_tool copy.py CHANGED
@@ -1,3 +1,5 @@
 
 
1
  import sys
2
  import os
3
  root_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
@@ -224,6 +226,7 @@ def run_hybrid_graph_retrrieval(graph: Neo4jGraph = None, corpus: List[Document]
224
  print(colored("Running Hybrid Retrieval...", "yellow"))
225
  unstructured_data = index_and_rank(corpus, query)
226
 
 
227
  query = f"""
228
  MATCH p = (n)-[r]->(m)
229
  WHERE COUNT {{(n)--()}} > 30
@@ -241,6 +244,7 @@ def run_hybrid_graph_retrrieval(graph: Neo4jGraph = None, corpus: List[Document]
241
  return retrieved_context
242
 
243
 
 
244
  @timeout(20) # Change: Takes url and query as input
245
  def intelligent_chunking(url: str, query: str) -> List[Document]:
246
  try:
@@ -251,7 +255,9 @@ def intelligent_chunking(url: str, query: str) -> List[Document]:
251
  raise ValueError("LLM_SHERPA_SERVER environment variable is not set")
252
 
253
  corpus = []
254
-
 
 
255
  try:
256
  print(colored("Starting LLM Sherpa LayoutPDFReader...\n\n", "yellow"))
257
  reader = LayoutPDFReader(llmsherpa_api_url)
@@ -261,7 +267,7 @@ def intelligent_chunking(url: str, query: str) -> List[Document]:
261
  print(colored(f"Error in LLM Sherpa LayoutPDFReader: {str(e)}", "red"))
262
  traceback.print_exc()
263
  doc = None
264
-
265
  if doc:
266
  for chunk in doc.chunks():
267
  document = Document(
@@ -321,6 +327,8 @@ def create_graph_index(
321
  ) -> Neo4jGraph:
322
 
323
  if os.environ.get('LLM_SERVER') == "openai":
 
 
324
  llm = ChatOpenAI(temperature=0, model_name="gpt-4o-mini")
325
 
326
  else:
@@ -370,6 +378,7 @@ def create_graph_index(
370
 
371
  def run_rag(urls: List[str], allowed_nodes: list[str] = None, allowed_relationships: list[str] = None, query: List[str] = None, hybrid: bool = False) -> List[Dict[str, str]]:
372
  # Change: adapted to take query and url as input.
 
373
  with concurrent.futures.ThreadPoolExecutor(max_workers=min(len(urls), 5)) as executor:
374
  futures = [executor.submit(intelligent_chunking, url, query) for url, query in zip(urls, query)]
375
  chunks_list = [future.result() for future in concurrent.futures.as_completed(futures)]
@@ -382,6 +391,7 @@ def run_rag(urls: List[str], allowed_nodes: list[str] = None, allowed_relationsh
382
 
383
  print(colored(f"\n\n DEBUG HYBRID VALUE: {hybrid}\n\n", "yellow"))
384
 
 
385
  if hybrid:
386
  print(colored(f"\n\n Creating Graph Index...\n\n", "green"))
387
  graph = Neo4jGraph()
 
1
+ # Hybird RAG, combining "similarity search" & "knowledge graph"
2
+
3
  import sys
4
  import os
5
  root_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 
226
  print(colored("Running Hybrid Retrieval...", "yellow"))
227
  unstructured_data = index_and_rank(corpus, query)
228
 
229
+ # We only feed > 30 to jar3d, subset
230
  query = f"""
231
  MATCH p = (n)-[r]->(m)
232
  WHERE COUNT {{(n)--()}} > 30
 
244
  return retrieved_context
245
 
246
 
247
+ # The chunking process begins with the intelligent_chunking function, which takes a URL and a query as input parameters.
248
  @timeout(20) # Change: Takes url and query as input
249
  def intelligent_chunking(url: str, query: str) -> List[Document]:
250
  try:
 
255
  raise ValueError("LLM_SHERPA_SERVER environment variable is not set")
256
 
257
  corpus = []
258
+ #The function utilizes LayoutPDFReader to read and extract text from the specified PDF document located at the given URL.
259
+ #This is done by calling the LLM Sherpa API, which handles the PDF reading and layout analysis.
260
+ #
261
  try:
262
  print(colored("Starting LLM Sherpa LayoutPDFReader...\n\n", "yellow"))
263
  reader = LayoutPDFReader(llmsherpa_api_url)
 
267
  print(colored(f"Error in LLM Sherpa LayoutPDFReader: {str(e)}", "red"))
268
  traceback.print_exc()
269
  doc = None
270
+ # Once the document is retrieved, it is processed into smaller, manageable chunks. Each chunk represents a segment of the document that retains semantic meaning and context.
271
  if doc:
272
  for chunk in doc.chunks():
273
  document = Document(
 
327
  ) -> Neo4jGraph:
328
 
329
  if os.environ.get('LLM_SERVER') == "openai":
330
+ # require hundreds calls to api
331
+ # we create index for every small chunk
332
  llm = ChatOpenAI(temperature=0, model_name="gpt-4o-mini")
333
 
334
  else:
 
378
 
379
  def run_rag(urls: List[str], allowed_nodes: list[str] = None, allowed_relationships: list[str] = None, query: List[str] = None, hybrid: bool = False) -> List[Dict[str, str]]:
380
  # Change: adapted to take query and url as input.
381
+ # Intellegent document chunking
382
  with concurrent.futures.ThreadPoolExecutor(max_workers=min(len(urls), 5)) as executor:
383
  futures = [executor.submit(intelligent_chunking, url, query) for url, query in zip(urls, query)]
384
  chunks_list = [future.result() for future in concurrent.futures.as_completed(futures)]
 
391
 
392
  print(colored(f"\n\n DEBUG HYBRID VALUE: {hybrid}\n\n", "yellow"))
393
 
394
+ # combined with graph
395
  if hybrid:
396
  print(colored(f"\n\n Creating Graph Index...\n\n", "green"))
397
  graph = Neo4jGraph()
tools/offline_graph_rag_tool.py CHANGED
@@ -67,7 +67,7 @@ def deduplicate_results(results, rerank=True):
67
  unique_results.append(result)
68
  return unique_results
69
 
70
-
71
  def index_and_rank(corpus: List[Document], query: str, top_percent: float = 20, batch_size: int = 25) -> List[Dict[str, str]]:
72
  print(colored(f"\n\nStarting indexing and ranking with FastEmbeddings and FAISS for {len(corpus)} documents\n\n", "green"))
73
  CACHE_DIR = "/app/fastembed_cache"
@@ -78,12 +78,13 @@ def index_and_rank(corpus: List[Document], query: str, top_percent: float = 20,
78
  try:
79
  # Initialize an empty FAISS index
80
  index = None
81
- docstore = InMemoryDocstore({})
82
  index_to_docstore_id = {}
83
 
84
  # Process documents in batches
85
  for i in range(0, len(corpus), batch_size):
86
  batch = corpus[i:i+batch_size]
 
87
  texts = [doc.page_content for doc in batch]
88
  metadatas = [doc.metadata for doc in batch]
89
 
@@ -215,13 +216,18 @@ def index_and_rank(corpus: List[Document], query: str, top_percent: float = 20,
215
 
216
  return final_results
217
 
 
218
  def run_hybrid_graph_retrieval(graph: Neo4jGraph = None, corpus: List[Document] = None, query: str = None, hybrid: bool = False):
219
  print(colored(f"\n\Initiating Retrieval...\n\n", "green"))
220
 
221
  if hybrid:
222
  print(colored("Running Hybrid Retrieval...", "yellow"))
223
- unstructured_data = index_and_rank(corpus, query)
 
224
 
 
 
 
225
  query = f"""
226
  MATCH p = (n)-[r]->(m)
227
  WHERE COUNT {{(n)--()}} > 30
@@ -229,6 +235,7 @@ def run_hybrid_graph_retrieval(graph: Neo4jGraph = None, corpus: List[Document]
229
  LIMIT 85
230
  """
231
  response = graph.query(query)
 
232
  retrieved_context = f"Important Relationships:{response}\n\n Additional Context:{unstructured_data}"
233
 
234
  else:
 
67
  unique_results.append(result)
68
  return unique_results
69
 
70
+ # Similarity search
71
  def index_and_rank(corpus: List[Document], query: str, top_percent: float = 20, batch_size: int = 25) -> List[Dict[str, str]]:
72
  print(colored(f"\n\nStarting indexing and ranking with FastEmbeddings and FAISS for {len(corpus)} documents\n\n", "green"))
73
  CACHE_DIR = "/app/fastembed_cache"
 
78
  try:
79
  # Initialize an empty FAISS index
80
  index = None
81
+ docstore = InMemoryDocstore({}) # store meta data
82
  index_to_docstore_id = {}
83
 
84
  # Process documents in batches
85
  for i in range(0, len(corpus), batch_size):
86
  batch = corpus[i:i+batch_size]
87
+ # abstract content and metadata
88
  texts = [doc.page_content for doc in batch]
89
  metadatas = [doc.metadata for doc in batch]
90
 
 
216
 
217
  return final_results
218
 
219
+ # TODO optimize the retrieval
220
  def run_hybrid_graph_retrieval(graph: Neo4jGraph = None, corpus: List[Document] = None, query: str = None, hybrid: bool = False):
221
  print(colored(f"\n\Initiating Retrieval...\n\n", "green"))
222
 
223
  if hybrid:
224
  print(colored("Running Hybrid Retrieval...", "yellow"))
225
+ # Similarity search
226
+ unstructured_data = index_and_rank(corpus, query) # similarity ranking
227
 
228
+ # Cypher query language
229
+ # eg: where there is a directed relationship from node n to node m through relationship r.
230
+ # This condition filters the results to include only those nodes n that have more than 30 connections (relationships) to other nodes. The {(n)--()} syntax counts all relationships connected to node n, ensuring that only well-connected nodes are considered.
231
  query = f"""
232
  MATCH p = (n)-[r]->(m)
233
  WHERE COUNT {{(n)--()}} > 30
 
235
  LIMIT 85
236
  """
237
  response = graph.query(query)
238
+ # this context will then pass to LLM to generate response
239
  retrieved_context = f"Important Relationships:{response}\n\n Additional Context:{unstructured_data}"
240
 
241
  else: