Papers
arxiv:2406.11837

Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%

Published on Jun 17, 2024
Authors:
,
,
,

Abstract

In the realm of image quantization exemplified by VQGAN, the process encodes images into discrete tokens drawn from a codebook with a predefined size. Recent advancements, particularly with LLAMA 3, reveal that enlarging the codebook significantly enhances model performance. However, VQGAN and its derivatives, such as <PRE_TAG>VQGAN-FC</POST_TAG> (Factorized Codes) and <PRE_TAG>VQGAN-EMA</POST_TAG>, continue to grapple with challenges related to expanding the codebook size and enhancing codebook utilization. For instance, <PRE_TAG>VQGAN-FC</POST_TAG> is restricted to learning a codebook with a maximum size of 16,384, maintaining a typically low utilization rate of less than 12% on ImageNet. In this work, we propose a novel image quantization model named <PRE_TAG>VQGAN-LC</POST_TAG> (Large Codebook), which extends the codebook size to 100,000, achieving an utilization rate exceeding 99%. Unlike previous methods that optimize each codebook entry, our approach begins with a codebook initialized with 100,000 features extracted by a pre-trained vision encoder. Optimization then focuses on training a projector that aligns the entire codebook with the feature distributions of the encoder in <PRE_TAG>VQGAN-LC</POST_TAG>. We demonstrate the superior performance of our model over its counterparts across a variety of tasks, including image reconstruction, image classification, auto-regressive image generation using GPT, and image creation with diffusion- and flow-based generative models. Code and models are available at https://github.com/zh460045050/<PRE_TAG>VQGAN-LC</POST_TAG>.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.11837 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.11837 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.11837 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.