LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 6 days ago • 89
VideoAgent: Long-form Video Understanding with Large Language Model as Agent Paper • 2403.10517 • Published Mar 15 • 31
HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting Paper • 2402.06149 • Published Feb 9 • 17