Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling Paper • 2409.05395 • Published Sep 9, 2024 • 5