L&V Models - a shawon Collection

shawon 's Collections

RAG

L&V Models

updated Oct 2

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

Paper • 2403.13248 • Published Mar 20 • 78
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 48
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

Paper • 2409.20551 • Published Sep 30 • 13
Visual Question Decomposition on Multimodal Large Language Models

Paper • 2409.19339 • Published Sep 28 • 7
Image Copy Detection for Diffusion Models

Paper • 2409.19952 • Published Sep 30 • 12
FreeInit: Bridging Initialization Gap in Video Diffusion Models

Paper • 2312.07537 • Published Dec 12, 2023 • 25