BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published 30 days ago • 11
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations Paper • 2407.03471 • Published Jul 3, 2024 • 29
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue Paper • 2402.05930 • Published Feb 8, 2024 • 38