SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding Paper • 2401.09340 • Published Jan 17 • 19
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Paper • 2401.12168 • Published Jan 22 • 26
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding Paper • 2401.15708 • Published Jan 28 • 11