WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation Paper • 2503.07265 • Published 2 days ago • 4
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published 2 days ago • 50
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate Paper • 2410.07167 • Published Oct 9, 2024 • 38
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning Paper • 2410.06373 • Published Oct 8, 2024 • 36
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation Paper • 2410.05363 • Published Oct 7, 2024 • 45
google/siglip-so400m-patch14-384 Zero-Shot Image Classification • Updated Sep 26, 2024 • 10.5M • • 489