AIM: Autoregressive Image Models

Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, and Armand Joulin

This software project accompanies the research paper, Scalable Pre-training of Large Autoregressive Image Models.

We introduce AIM a collection of vision models pre-trained with an autoregressive generative objective. We show that autoregressive pre-training of image features exhibits similar scaling properties to their textual counterpart (i.e. Large Language Models). Specifically, we highlight two findings:

the model capacity can be trivially scaled to billions of parameters, and
AIM effectively leverages large collections of uncurated image data.

Installation

Please install PyTorch using the official installation instructions. Afterward, install the package as:

pip install git+https://git@github.com/apple/ml-aim.git

Usage

Below we provide an example of loading the model via HuggingFace Hub as:

from PIL import Image

from aim.torch.models import AIMForImageClassification
from aim.torch.data import val_transforms

img = Image.open(...)
model = AIMForImageClassification.from_pretrained("apple/aim-3B")
transform = val_transforms()

inp = transform(img).unsqueeze(0)
logits, features = model(inp)

ImageNet-1k results (frozen trunk)

The table below contains the classification results on ImageNet-1k validation set.

model	top-1 IN-1k
model	last layer	best layer
AIM-0.6B	78.5%	79.4%
AIM-1B	80.6%	82.3%
AIM-3B	82.2%	83.3%
AIM-7B	82.4%	84.0%

apple
/

AIM-3B

AIM: Autoregressive Image Models

Installation

Usage

ImageNet-1k results (frozen trunk)

Collection including apple/AIM-3B

AIM