Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
5
9
s k
madstuntman11
Follow
0 followers
Β·
26 following
AI & ML interests
None yet
Recent Activity
Reacted to
merve
's
post
with β€οΈ
27 days ago
If you have documents that do not only have text and you're doing retrieval or RAG (using OCR and LLMs), give it up and give ColPali and vision language models a try π€ Why? Documents consist of multiple modalities: layout, table, text, chart, images. Document processing pipelines often consist of multiple models and they're immensely brittle and slow. π₯² How? ColPali is a ColBERT-like document retrieval model built on PaliGemma, it operates over image patches directly, and indexing takes far less time with more accuracy. You can use it for retrieval, and if you want to do retrieval augmented generation, find the closest document, and do not process it, give it directly to a VLM like Qwen2-VL (as image input) and give your text query. π€ This is much faster + you do not lose out on any information + much easier to maintain too! π₯³ Multimodal RAG https://huggingface.co/collections/merve/multimodal-rag-66d97602e781122aae0a5139 π¬ Document AI (made it way before, for folks who want structured input/output and can fine-tune a model) https://huggingface.co/collections/merve/awesome-document-ai-65ef1cdc2e97ef9cc85c898e π
Reacted to
merve
's
post
with β€οΈ
27 days ago
I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use: - https://huggingface.co/vidore/colpali for retrieval π it doesn't need indexing with image-text pairs but just images! - https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct for generation π¬ directly feed images as is to a vision language model with no processing to text! I used ColPali implementation of the new π Byaldi library by @bclavie π€ https://github.com/answerdotai/byaldi Link to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb
upvoted
an
article
27 days ago
Document Similarity Search with ColPali
View all activity
Organizations
None yet
madstuntman11
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a Space
about 1 month ago
Running
5
π
Diffusers Image Outpaint
liked
a model
4 months ago
manu/colpali-3b-mix-448-docmatix-only-mined-ib
Updated
Jul 31
β’
3
β’
2
liked
a Space
4 months ago
Running
83
π₯
Vidore Leaderboard
liked
a model
4 months ago
nvidia/MambaVision-T-1K
Image Feature Extraction
β’
Updated
Jul 25
β’
5.19k
β’
26
liked
a Space
4 months ago
Running
on
Zero
15
π
Florence 2
liked
a dataset
4 months ago
Tevatron/docmatix-ir
Viewer
β’
Updated
Aug 12
β’
5.61M
β’
3.11k
β’
12
liked
2 models
4 months ago
Tevatron/dse-phi3-docmatix-v1
Updated
Aug 12
β’
51
β’
9
manu/colpali-3b-mix-448-docmatix
Updated
Jul 23
β’
7
liked
a Space
5 months ago
Running
3.71k
ππ€
Chatbot Arena Leaderboard