che111
's Collections
Localize Viusal Understanding
updated
GLaMM: Pixel Grounding Large Multimodal Model
Paper
•
2311.03356
•
Published
•
33
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
Multi-modal Large Language Models
Paper
•
2311.07575
•
Published
•
13
CoVLM: Composing Visual Entities and Relationships in Large Language
Models Via Communicative Decoding
Paper
•
2311.03354
•
Published
•
4
Language-Informed Visual Concept Learning
Paper
•
2312.03587
•
Published
•
5
Denoising Vision Transformers
Paper
•
2401.02957
•
Published
•
28
Learning Anatomically Consistent Embedding for Chest Radiography
Paper
•
2312.00335
•
Published
Representing Part-Whole Hierarchies in Foundation Models by Learning
Localizability, Composability, and Decomposability from Anatomy via
Self-Supervision
Paper
•
2404.15672
•
Published
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and
Understanding
Paper
•
2406.19389
•
Published
•
52
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Paper
•
2406.17770
•
Published
•
18
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal
Large Language Model
Paper
•
2407.16198
•
Published
•
13
Contrastive Localized Language-Image Pre-Training
Paper
•
2410.02746
•
Published
•
33