PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published about 1 month ago • 120
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published 30 days ago • 105
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance Paper • 2412.02687 • Published Dec 3, 2024 • 108
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video Paper • 2411.18671 • Published Nov 27, 2024 • 20
Small Language Models: Survey, Measurements, and Insights Paper • 2409.15790 • Published Sep 24, 2024 • 1
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 22 days ago • 136
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 22 days ago • 80