File size: 3,338 Bytes
806eb8e
 
 
a89a6d3
806eb8e
3cefe12
1dff7ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
806eb8e
1dff7ce
 
 
 
 
806eb8e
1dff7ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
library_name: diffusers
license: mit
pipeline_tag: unconditional-image-generation
---

# Autoregressive Image Generation without Vector Quantization

## About
This model (MAR) introduces a novel approach to autoregressive image generation by eliminating the need for vector quantization. 
Instead of relying on discrete tokens, the model operates in a continuous-valued space using a diffusion process to model the per-token probability distribution. 
By employing a Diffusion Loss function, the model achieves efficient and high-quality image generation while benefiting from the speed advantages of autoregressive sequence modeling. 
This approach simplifies the generation process, making it applicable to broader continuous-valued domains beyond just image synthesis.
It is based on [this paper](https://arxiv.org/abs/2406.11838)

## Usage:
You can easily load it through the Hugging Face `DiffusionPipeline` and optionally customize various parameters such as the model type, number of steps, and class labels. 

```python
from diffusers import DiffusionPipeline

# load the pretrained model
pipeline = DiffusionPipeline.from_pretrained("jadechoghari/mar", trust_remote_code=True, custom_pipeline="jadechoghari/mar")

# generate an image with the model
generated_image = pipeline(
    model_type="mar_huge",  # choose from 'mar_base', 'mar_large', or 'mar_huge'
    seed=42,                # set a seed for reproducibility
    num_ar_steps=64,        # number of autoregressive steps
    class_labels=[207, 360, 388],  # provide valid ImageNet class labels
    cfg_scale=4,            # classifier-free guidance scale
    output_dir="./images",   # directory to save generated images
    cfg_schedule = "constant", # choose between 'constant' (suggested) and 'linear'
)

# display the generated image
generated_image.show()
```

<p align="center">
  <img src="https://github.com/LTH14/mar/raw/main/demo/visual.png" width="500">
</p>

This code loads the model, configures it for image generation, and saves the output to a specified directory. 

We offer three pre-trained MAR models in `safetensors` format:
- `mar-base.safetensors`
- `mar-large.safetensors`
- `mar-huge.safetensors`


<!-- <p align="center">
  <img src="https://github.com/LTH14/mar/raw/main/demo/visual.png" width="720">
</p> -->

This is a Hugging Face Diffusers/GPU implementation of the paper [Autoregressive Image Generation without Vector Quantization](https://arxiv.org/abs/2406.11838)

The Official PyTorch Implementation is released in [this repository](https://github.com/LTH14/mar)

```
@article{li2024autoregressive,
  title={Autoregressive Image Generation without Vector Quantization},
  author={Li, Tianhong and Tian, Yonglong and Li, He and Deng, Mingyang and He, Kaiming},
  journal={arXiv preprint arXiv:2406.11838},
  year={2024}
}
```

## Acknowledgements
We thank Congyue Deng and Xinlei Chen for helpful discussion. We thank
Google TPU Research Cloud (TRC) for granting us access to TPUs, and Google Cloud Platform for
supporting GPU resources.

A large portion of codes in this repo is based on [MAE](https://github.com/facebookresearch/mae), [MAGE](https://github.com/LTH14/mage) and [DiT](https://github.com/facebookresearch/DiT).

## Contact

If you have any questions, feel free to contact me through email (tianhong@mit.edu). Enjoy!