Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
vincentg64 
posted an update Nov 29, 2024
Post
520
LLM Deep Contextual Retrieval and Multi-Index Chunking: Nvidia PDFs, Case Study https://mltblog.com/3OBfU2p

The technology described here boosts exhaustivity and structuredness in LLM prompt results, efficiently exploiting the knowledge graph and contextual structure present in any professional or enterprise corpus. The case study deals with public financial reports from Nvidia, available as PDF documents.

In this article, I discuss the preprocessing steps used to turn a PDF repository into input suitable for LLMs. It includes contextual chunking, indexing text entities with hierarchical multi-index system, and retrieving contextual elements including lists, sub-lists, fonts (type, color, and size), images and tables – some not detected by standard Python libraries. I also discuss how to build additional contextual information such as agents, categories, or tags, to add to text entities to further improve any LLM architecture, and prompt results.
In this post