Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published 13 days ago • 61
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published 8 days ago • 37
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published 7 days ago • 37
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning Paper • 2410.06373 • Published Oct 8 • 35
SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification Paper • 2410.05057 • Published Oct 7 • 7
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19 • 135
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published Sep 19 • 36
Rejuvenating image-GPT as Strong Visual Representation Learners Paper • 2312.02147 • Published Dec 4, 2023 • 4