Deping
's Collections
VLMS
updated
PsiPi/liuhaotian_llava-v1.5-13b-GGUF
Image-Text-to-Text
•
Updated
•
501
•
36
TRI-ML/prismatic-vlms
Image-to-Text
•
Updated
•
16
bczhou/tiny-llava-v1-hf
Image-Text-to-Text
•
Updated
•
1.37k
•
56
ViGoR: Improving Visual Grounding of Large Vision Language Models with
Fine-Grained Reward Modeling
Paper
•
2402.06118
•
Published
•
14
LEGO:Language Enhanced Multi-modal Grounding Model
Paper
•
2401.06071
•
Published
•
10
Mini-Gemini: Mining the Potential of Multi-modality Vision Language
Models
Paper
•
2403.18814
•
Published
•
46
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal
Language Models
Paper
•
2403.16999
•
Published
•
4
Salesforce/instructblip-vicuna-7b
Image-Text-to-Text
•
Updated
•
236k
•
87
Pegasus-v1 Technical Report
Paper
•
2404.14687
•
Published
•
31
List Items One by One: A New Data Source and Learning Paradigm for
Multimodal LLMs
Paper
•
2404.16375
•
Published
•
17
Needle In A Multimodal Haystack
Paper
•
2406.07230
•
Published
•
53