Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models Paper • 2410.03290 • Published Oct 4 • 6
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video Paper • 2411.18671 • Published 19 days ago • 19
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation Paper • 2412.00927 • Published 15 days ago • 25