Rajdeep Ghosh's picture

2 19

Rajdeep Ghosh

rumbleFTW

·

AI & ML interests

Transformers, GANs, Audio synthesis, LLMs, Diffusion.

Recent Activity

liked a model 29 days ago

hexgrad/Kokoro-82M

reacted to merve's post with ❤️ 5 months ago

If you have documents that do not only have text and you're doing retrieval or RAG (using OCR and LLMs), give it up and give ColPali and vision language models a try 🤗 Why? Documents consist of multiple modalities: layout, table, text, chart, images. Document processing pipelines often consist of multiple models and they're immensely brittle and slow. 🥲 How? ColPali is a ColBERT-like document retrieval model built on PaliGemma, it operates over image patches directly, and indexing takes far less time with more accuracy. You can use it for retrieval, and if you want to do retrieval augmented generation, find the closest document, and do not process it, give it directly to a VLM like Qwen2-VL (as image input) and give your text query. 🤝 This is much faster + you do not lose out on any information + much easier to maintain too! 🥳 Multimodal RAG https://huggingface.co/collections/merve/multimodal-rag-66d97602e781122aae0a5139 💬 Document AI (made it way before, for folks who want structured input/output and can fine-tune a model) https://huggingface.co/collections/merve/awesome-document-ai-65ef1cdc2e97ef9cc85c898e 📖

reacted to merve's post with 👍 5 months ago

If you have documents that do not only have text and you're doing retrieval or RAG (using OCR and LLMs), give it up and give ColPali and vision language models a try 🤗 Why? Documents consist of multiple modalities: layout, table, text, chart, images. Document processing pipelines often consist of multiple models and they're immensely brittle and slow. 🥲 How? ColPali is a ColBERT-like document retrieval model built on PaliGemma, it operates over image patches directly, and indexing takes far less time with more accuracy. You can use it for retrieval, and if you want to do retrieval augmented generation, find the closest document, and do not process it, give it directly to a VLM like Qwen2-VL (as image input) and give your text query. 🤝 This is much faster + you do not lose out on any information + much easier to maintain too! 🥳 Multimodal RAG https://huggingface.co/collections/merve/multimodal-rag-66d97602e781122aae0a5139 💬 Document AI (made it way before, for folks who want structured input/output and can fine-tune a model) https://huggingface.co/collections/merve/awesome-document-ai-65ef1cdc2e97ef9cc85c898e 📖

View all activity

Organizations

rumbleFTW's activity

liked a model 29 days ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 10 days ago • 1.59M • 3.66k

liked a model 6 months ago

deepseek-ai/DeepSeek-V2.5

Text Generation • Updated Dec 11, 2024 • 3.8k • 701

liked 3 models 7 months ago

sarvamai/shuka-1

Audio-Text-to-Text • Updated 39 minutes ago • 695 • 47

sarvamai/sarvam-0.5

Text Generation • Updated Nov 8, 2024 • 954 • • 86

parler-tts/parler-tts-large-v1

Text-to-Speech • Updated Nov 22, 2024 • 22.8k • 245

liked 4 models 8 months ago

meta-llama/Llama-3.1-8B-Instruct

Text Generation • Updated Sep 25, 2024 • 6.12M • • 3.75k

apple/DCLM-7B

Updated Jul 26, 2024 • 781 • 835

Groq/Llama-3-Groq-8B-Tool-Use

Text Generation • Updated Aug 27, 2024 • 853 • 274

google/gemma-2-9b

Text Generation • Updated Aug 7, 2024 • 103k • 654

liked 2 models 9 months ago

hubertsiuzdak/snac_24khz

Updated Apr 3, 2024 • 22.9k • 17

bitext/Mistral-7B-Customer-Support

Text Generation • Updated Jul 25, 2024 • 229 • 9

liked a model 10 months ago

meta-llama/Meta-Llama-3-8B

Text Generation • Updated Sep 27, 2024 • 420k • 6.08k

liked a model 11 months ago

facebook/wav2vec2-base-960h

Automatic Speech Recognition • Updated Nov 14, 2022 • 4.03M • • 324

liked a model 12 months ago

xai-org/grok-1

Text Generation • Updated Mar 28, 2024 • 1.28k • 2.28k

liked a Space about 1 year ago

OutfitAnyone

Generate virtual try-on results for clothing

liked 2 models about 1 year ago

sarvamai/OpenHathi-7B-Hi-v0.1-Base

Text Generation • Updated Dec 22, 2023 • 2.97k • 107

microsoft/phi-2

Text Generation • Updated Apr 29, 2024 • 394k • • 3.29k

liked a Space about 1 year ago

PDF Chatbot

Ask questions about PDF documents

liked a model about 1 year ago

mistralai/Mistral-7B-v0.1

Text Generation • Updated Jul 24, 2024 • 354k • • 3.65k