Distilling Vision-Language Models on Millions of Videos Paper • 2401.06129 • Published Jan 11, 2024 • 15
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Paper • 2404.05726 • Published Apr 8, 2024 • 21
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding Paper • 2406.07471 • Published Jun 11, 2024 • 1
VISA: Reasoning Video Object Segmentation via Large Language Models Paper • 2407.11325 • Published Jul 16, 2024 • 1
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Paper • 2407.15841 • Published Jul 22, 2024 • 40
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges Paper • 2409.01071 • Published Sep 2, 2024 • 27
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model Paper • 2409.01199 • Published Sep 2, 2024 • 14