Model card for vit_large_patch14_clip_336.laion2b_ft_augreg_inat21

Part of a series of timm fine-tune experiments on iNaturalist 2021 competition data (https://github.com/visipedia/inat_comp/tree/master/2021) for higher capacity models.

Covering 10,000 species, this dataset and these models are fun to explore via the classification widget with pictures from your backyard, but quite a bit smaller than models you can find on iNaturalist website (https://www.inaturalist.org/blog/75633-a-new-computer-vision-model-v2-1-including-1-770-new-taxa).

No extra meta-data was used for training these models (as was the case for the competition), it was a straightfoward fine-tune to explore differences in model pretrain data.

Model	Top-1	Top-5	Img Size (Train)	Paper
eva02_large_patch14_clip_336.merged2b_ft_inat21	92.05	98.01	336	https://arxiv.org/abs/2303.11331
vit_large_patch14_clip_336.datacompxl_ft_augreg_inat21	91.98	98.03	336	https://arxiv.org/abs/2304.14108
vit_large_patch14_clip_336.laion2b_ft_augreg_inat21	91.48	97.89	336	https://arxiv.org/abs/2212.07143
convnext_large_mlp.laion2b_ft_augreg_inat21	90.95	97.68	448 (384)
vit_large_patch14_clip_336.datacompxl_ft_inat21	90.85	97.68	336	https://arxiv.org/abs/2304.14108
convnext_large_mlp.laion2b_ft_augreg_inat21	90.62	97.61	384
vit_large_patch14_clip_336.laion2b_ft_in12k_in1k_inat21	90.29	97.44	336	https://arxiv.org/abs/2212.07143

Run Validation

python validate.py /tfds/ --dataset tfds/i_naturalist2021 --model hf-hub:timm/vit_large_patch14_clip_336.laion2b_ft_augreg_inat21 --split val --amp

Citation

@inproceedings{cherti2023reproducible,
  title={Reproducible scaling laws for contrastive language-image learning},
  author={Cherti, Mehdi and Beaumont, Romain and Wightman, Ross and Wortsman, Mitchell and Ilharco, Gabriel and Gordon, Cade and Schuhmann, Christoph and Schmidt, Ludwig and Jitsev, Jenia},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={2818--2829},
  year={2023}
}