Multimodal Latent Language Modeling with Next-Token Diffusion Paper ā¢ 2412.08635 ā¢ Published 23 days ago ā¢ 41
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models Paper ā¢ 2411.14982 ā¢ Published Nov 22, 2024 ā¢ 16
Multimodal-SAE Collection The collection of the sae that hooked on llava ā¢ 4 items ā¢ Updated Nov 25, 2024 ā¢ 4
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper ā¢ 2412.03555 ā¢ Published 30 days ago ā¢ 119
view article Article LLaVA-o1: Let Vision Language Models Reason Step-by-Step By mikelabs ā¢ Nov 19, 2024 ā¢ 11
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper ā¢ 2411.14405 ā¢ Published Nov 21, 2024 ā¢ 58
view article Article Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK By davidberenstein1957 ā¢ Nov 21, 2024 ā¢ 35
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper ā¢ 2411.06176 ā¢ Published Nov 9, 2024 ā¢ 44
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper ā¢ 2411.07461 ā¢ Published Nov 12, 2024 ā¢ 21
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models Paper ā¢ 2411.05005 ā¢ Published Nov 7, 2024 ā¢ 13
GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation Paper ā¢ 2410.20474 ā¢ Published Oct 27, 2024 ā¢ 14
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering Paper ā¢ 2410.15999 ā¢ Published Oct 21, 2024 ā¢ 19
view article Article Running Large Multimodal Models on an AI PC's NPU By bconsolvo ā¢ Jun 11, 2024 ā¢ 14
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Paper ā¢ 2410.17247 ā¢ Published Oct 22, 2024 ā¢ 45
Harnessing Webpage UIs for Text-Rich Visual Understanding Paper ā¢ 2410.13824 ā¢ Published Oct 17, 2024 ā¢ 30
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper ā¢ 2410.16153 ā¢ Published Oct 21, 2024 ā¢ 44
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation Paper ā¢ 2410.13861 ā¢ Published Oct 17, 2024 ā¢ 53