Structured 3D Latents for Scalable and Versatile 3D Generation Paper • 2412.01506 • Published 21 days ago • 42
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types Paper • 2409.09269 • Published Sep 14 • 7
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published Sep 13 • 48
CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models Paper • 2405.13974 • Published May 22 • 9
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 124
view article Article Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task By danaaubakirova • May 16 • 17