-
Distilling Vision-Language Models on Millions of Videos
Paper • 2401.06129 • Published • 15 -
Koala: Key frame-conditioned long video-LLM
Paper • 2404.04346 • Published • 5 -
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Paper • 2404.05726 • Published • 20 -
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
Paper • 2406.07471 • Published • 1
liu
che111
AI & ML interests
None yet
Organizations
Collections
8
-
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper • 2406.12275 • Published • 29 -
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Paper • 2405.15738 • Published • 43 -
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Paper • 2408.08872 • Published • 97
models
1
datasets
None public yet