Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset Mar 15, 2024 • 7
Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 28
CompCap: Improving Multimodal Large Language Models with Composite Captions Paper • 2412.05243 • Published 27 days ago • 18 • 4
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 124 • 5
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 124 • 5