Collections
Discover the best community collections!
Collections including paper arxiv:2308.13418
-
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Paper β’ 2306.17107 β’ Published β’ 11 -
On the Hidden Mystery of OCR in Large Multimodal Models
Paper β’ 2305.07895 β’ Published -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper β’ 2308.12966 β’ Published β’ 6 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper β’ 2401.15947 β’ Published β’ 47
-
GAIA: a benchmark for General AI Assistants
Paper β’ 2311.12983 β’ Published β’ 175 -
Fine-tuning Language Models for Factuality
Paper β’ 2311.08401 β’ Published β’ 26 -
LayoutPrompter: Awaken the Design Ability of Large Language Models
Paper β’ 2311.06495 β’ Published β’ 9 -
Prompt Engineering a Prompt Engineer
Paper β’ 2311.05661 β’ Published β’ 19
-
Kosmos-2.5: A Multimodal Literate Model
Paper β’ 2309.11419 β’ Published β’ 49 -
Nougat: Neural Optical Understanding for Academic Documents
Paper β’ 2308.13418 β’ Published β’ 33 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper β’ 2310.08491 β’ Published β’ 51 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper β’ 2402.17764 β’ Published β’ 574
-
Nougat: Neural Optical Understanding for Academic Documents
Paper β’ 2308.13418 β’ Published β’ 33 -
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Paper β’ 2307.02499 β’ Published β’ 14 -
Text Rendering Strategies for Pixel Language Models
Paper β’ 2311.00522 β’ Published β’ 10