AIM: Autoregressive Image Models
Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, and Armand Joulin
This software project accompanies the research paper, Scalable Pre-training of Large Autoregressive Image Models.
We introduce AIM a collection of vision models pre-trained with an autoregressive generative objective. We show that autoregressive pre-training of image features exhibits similar scaling properties to their textual counterpart (i.e. Large Language Models). Specifically, we highlight two findings:
- the model capacity can be trivially scaled to billions of parameters, and
- AIM effectively leverages large collections of uncurated image data.
Installation
Please install PyTorch using the official installation instructions. Afterward, install the package as:
pip install git+https://git@github.com/apple/ml-aim.git
Usage
Below we provide an example of loading the model via HuggingFace Hub as:
from PIL import Image
from aim.torch.models import AIMForImageClassification
from aim.torch.data import val_transforms
img = Image.open(...)
model = AIMForImageClassification.from_pretrained("apple/aim-3B")
transform = val_transforms()
inp = transform(img).unsqueeze(0)
logits, features = model(inp)
ImageNet-1k results (frozen trunk)
The table below contains the classification results on ImageNet-1k validation set.
model | top-1 IN-1k | |
---|---|---|
last layer | best layer | |
AIM-0.6B | 78.5% | 79.4% |
AIM-1B | 80.6% | 82.3% |
AIM-3B | 82.2% | 83.3% |
AIM-7B | 82.4% | 84.0% |
- Downloads last month
- 22