Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 27
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published 4 days ago • 36
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published 18 days ago • 48
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch Paper • 2410.18693 • Published Oct 24 • 40
WAFFLE: Multi-Modal Model for Automated Front-End Development Paper • 2410.18362 • Published Oct 24 • 11
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices Paper • 2410.11795 • Published Oct 15 • 16
LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks Paper • 2410.01744 • Published Oct 2 • 25
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World Paper • 2402.19474 • Published Feb 29 • 2
Imagine yourself: Tuning-Free Personalized Image Generation Paper • 2409.13346 • Published Sep 20 • 67
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19 • 47
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published Sep 19 • 36
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey Paper • 2409.11564 • Published Sep 17 • 19
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published Sep 13 • 47
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4 • 72
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining Paper • 2409.02326 • Published Sep 3 • 18