Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Shreyas094 
posted an update Sep 10
Post
617
Help me to upgrade my model.

Hi all, so I am a complete beginner in coding, however, with the help of Claude (similar to Matt :P) and GPT 4o have been able to develop this RAG PDF summarizer/Q&A plus a web search tool.

The application is specifically built for summarization task including summarizing a financial document, news article, resume, research document, call transcript, etc.

The space could be found here: Shreyas094/SearchGPT

The news tool simply use duckduckgo chat to generate the search results using llama 3.1 70bn model.

I want your support to fine tune the retrieval task for handling more unstructured documents.

I think changing this would change the search results somewhat, but there don't seem to be too many options to choose from.
I can give you some advice if I know how you want to enhance it.

https://huggingface.co/spaces/Shreyas094/SearchGPT/blob/main/app.py

def get_web_search_results(query: str, max_results: int = 10) -> List[Dict[str, str]]:
    try:
        results = list(DDGS().text(query, max_results=max_results))

https://pypi.org/project/duckduckgo-search/#2-text---text-search-by-duckduckgocom

·

Hi John, thanks so much for the contribution. However, I would like to implement some upgrades to my RAG setup for PDF summarization task. Currently I have not worked alot on my Vector DB creation, chunking, indexing and embeddings part. I feel working on these functions shall improve the retrieval process, especially when it comes to 100-200 pager research documents. If possible, can you provide some suggestion on that part. Thanks

Bro the (similar to Matt ) killed me XD

·

Hahaha atleast someone got it