When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning Paper • 2503.07588 • Published 3 days ago • 3
Cost-Optimal Grouped-Query Attention for Long-Context LLMs Paper • 2503.09579 • Published 1 day ago • 3
Quantizing Large Language Models for Code Generation: A Differentiated Replication Paper • 2503.07103 • Published 4 days ago • 6
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published 1 day ago • 35
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction Paper • 2503.03734 • Published 8 days ago • 1
"Principal Components" Enable A New Language of Images Paper • 2503.08685 • Published 2 days ago • 10
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 3 days ago • 89
MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice Paper • 2503.05978 • Published 6 days ago • 30
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 3 days ago • 31
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models Paper • 2503.08686 • Published 2 days ago • 14
Gemini Embedding: Generalizable Embeddings from Gemini Paper • 2503.07891 • Published 3 days ago • 24
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published 3 days ago • 70
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories Paper • 2503.08625 • Published 2 days ago • 23
UniF^2ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models Paper • 2503.08120 • Published 3 days ago • 27