VLMS - a Deping Collection

Deping 's Collections

VisionExpertModels

LLMs

VLMS

GeneralDetector

VLMS

updated Sep 22, 2024

PsiPi/liuhaotian_llava-v1.5-13b-GGUF

Image-Text-to-Text • Updated Mar 11, 2024 • 501 • 36
TRI-ML/prismatic-vlms

Image-to-Text • Updated May 6, 2024 • 16
bczhou/tiny-llava-v1-hf

Image-Text-to-Text • Updated Aug 17, 2024 • 1.37k • 56
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

Paper • 2402.06118 • Published Feb 9, 2024 • 14
LEGO:Language Enhanced Multi-modal Grounding Model

Paper • 2401.06071 • Published Jan 11, 2024 • 10
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27, 2024 • 46
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

Paper • 2403.16999 • Published Mar 25, 2024 • 4
Salesforce/instructblip-vicuna-7b

Image-Text-to-Text • Updated Nov 21, 2024 • 236k • 87
Pegasus-v1 Technical Report

Paper • 2404.14687 • Published Apr 23, 2024 • 31
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Paper • 2404.16375 • Published Apr 25, 2024 • 17
Needle In A Multimodal Haystack

Paper • 2406.07230 • Published Jun 11, 2024 • 53