Scalable Autoregressive Image Generation with Mamba
Abstract
We introduce AiM, an autoregressive (AR) image generative model based on Mamba architecture. AiM employs Mamba, a novel state-space model characterized by its exceptional performance for long-sequence modeling with linear time complexity, to supplant the commonly utilized Transformers in AR image generation models, aiming to achieve both superior generation quality and enhanced inference speed. Unlike existing methods that adapt Mamba to handle two-dimensional signals via multi-directional scan, AiM directly utilizes the next-token prediction paradigm for autoregressive image generation. This approach circumvents the need for extensive modifications to enable Mamba to learn 2D spatial representations. By implementing straightforward yet strategically targeted modifications for visual generative tasks, we preserve Mamba's core structure, fully exploiting its efficient long-sequence modeling capabilities and scalability. We provide AiM models in various scales, with parameter counts ranging from 148M to 1.3B. On the ImageNet1K 256*256 benchmark, our best AiM model achieves a FID of 2.21, surpassing all existing AR models of comparable parameter counts and demonstrating significant competitiveness against diffusion models, with 2 to 10 times faster inference speed. Code is available at https://github.com/hp-l33/AiM
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba (2024)
- VideoMamba: Spatio-Temporal Selective State Space Model (2024)
- ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2 (2024)
- Discrete Diffusion Language Model for Long Text Summarization (2024)
- MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 4
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper