--- license: other license_name: apple-sample-code-license license_link: LICENSE library_name: ml-aim pipeline_tag: image-classification --- # AIM: Autoregressive Image Models *Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, and Armand Joulin* This software project accompanies the research paper, [Scalable Pre-training of Large Autoregressive Image Models](https://arxiv.org/abs/2401.08541). We introduce **AIM** a collection of vision models pre-trained with an autoregressive generative objective. We show that autoregressive pre-training of image features exhibits similar scaling properties to their textual counterpart (i.e. Large Language Models). Specifically, we highlight two findings: 1. the model capacity can be trivially scaled to billions of parameters, and 2. AIM effectively leverages large collections of uncurated image data. ## Installation Please install PyTorch using the official [installation instructions](https://pytorch.org/get-started/locally/). Afterward, install the package as: ```commandline pip install git+https://git@github.com/apple/ml-aim.git ``` ## Usage Below we provide an example of loading the model via [HuggingFace Hub](https://huggingface.co/docs/hub/) as: ```python from PIL import Image from aim.torch.models import AIMForImageClassification from aim.torch.data import val_transforms img = Image.open(...) model = AIMForImageClassification.from_pretrained("apple/aim-3B") transform = val_transforms() inp = transform(img).unsqueeze(0) logits, features = model(inp) ``` ### ImageNet-1k results (frozen trunk) The table below contains the classification results on ImageNet-1k validation set.
model | top-1 IN-1k | |
---|---|---|
last layer | best layer | |
AIM-0.6B | 78.5% | 79.4% |
AIM-1B | 80.6% | 82.3% |
AIM-3B | 82.2% | 83.3% |
AIM-7B | 82.4% | 84.0% |