-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 123 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 124 -
TiC-CLIP: Continual Training of CLIP Models
Paper • 2310.16226 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2403.09611
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 123 -
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 54 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 25
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 123 -
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Paper • 2306.16527 • Published • 44 -
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Paper • 2404.12387 • Published • 36 -
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Paper • 2404.16790 • Published • 7
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 123 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 47 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 9 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 63