Searching for Better ViT Baselines Collection Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). • 19 items • Updated 14 days ago • 11
MobileNetV4 pretrained weights Collection Weights for MobileNet-V4 pretrained in timm • 13 items • Updated 2 days ago • 8
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens Paper • 2406.11271 • Published 9 days ago • 10
What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published 14 days ago • 38
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper • 2405.18392 • Published 29 days ago • 12
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published May 24 • 43
view article Article Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task By danaaubakirova • May 16 • 15
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 11 items • Updated May 17 • 118
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9 • 29
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 572
Small Language Model Meets with Reinforced Vision Vocabulary Paper • 2401.12503 • Published Jan 23 • 31
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Paper • 2401.12168 • Published Jan 22 • 21
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark Paper • 2401.11944 • Published Jan 22 • 24
Scalable Pre-training of Large Autoregressive Image Models Paper • 2401.08541 • Published Jan 16 • 35
SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 8 items • Updated May 14 • 28
Distributed Inference and Fine-tuning of Large Language Models Over The Internet Paper • 2312.08361 • Published Dec 13, 2023 • 23
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 168