IamYash/ImageNet-Tiny-AttentionOnly

ImageNet Results

In our ImageNet experiment, we aimed to assess the performance of Mice ViTs on a more complex and diverse dataset, ImageNet. We trained mice ViTs on classifying the 1000 ImageNet classes.

Training Details

Similar to the dSprites experiment, for each attention layer setting, we explored two model variants: an attention-only model and a model combining attention with the MLP module. Dropout and layer normalization were not applied for simplicity. The detailed training logs and metrics can be found here.

Table of Results

Below table describe the accuracy [ <Acc> | <Top5 Acc> ] of Mice ViTs with different configurations.

Size	NumLayers	Attention+MLP	AttentionOnly	Model Link
tiny	1	0.16 \| 0.33	0.11 \| 0.25	AttentionOnly, Attention+MLP
base	2	0.23 \| 0.44	0.16 \| 0.34	AttentionOnly, Attention+MLP
small	3	0.28 \| 0.51	0.17 \| 0.35	AttentionOnly, Attention+MLP
medium	4	0.33 \| 0.56	0.17 \| 0.36	AttentionOnly, Attention+MLP