view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 5 days ago • 269
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 • 113
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity Paper • 2502.13063 • Published 26 days ago • 67
SurveyX: Academic Survey Automation via Large Language Models Paper • 2502.14776 • Published 24 days ago • 93
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? Paper • 2502.14502 • Published 24 days ago • 85
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 24 days ago • 97
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13 • 184
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 24 days ago • 129
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published 9 days ago • 105
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published 24 days ago • 162
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 24 days ago • 179
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published 13 days ago • 72
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published about 1 month ago • 39