VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 105
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Paper • 2410.10594 • Published Oct 14, 2024 • 24
meta-llama/Llama-3.2-11B-Vision-Instruct Image-Text-to-Text • Updated Dec 4, 2024 • 2.51M • • 1.2k