DepthPro Models Collection Depth Pro: Sharp Monocular Metric Depth in Less Than a Second • 4 items • Updated 16 days ago • 7
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published 20 days ago • 35
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers Paper • 1906.02792 • Published Jun 6, 2019
STIV: Scalable Text and Image Conditioned Video Generation Paper • 2412.07730 • Published Dec 10, 2024 • 71