Papers
arxiv:2405.21060

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Published on May 31
· Submitted by akhaliq on Jun 3
#1 Paper of the day
Authors:

Abstract

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.

Community

I have been waiting for your new paper, Professor Gu and Dao!

How does it compare with YOCO?

·

They aren't related

Sign up or log in to comment

Models citing this paper 10

Browse 10 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2405.21060 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 15