VisionLLaMA-Base-MAE

With the Masked Autoencoders' paradigm, VisionLLaMA-Large-MAE model is trained on ImageNet-1K without labels. It retains improvements over classification tasks (SFT, linear probing) on ImageNet-1K.

Model ImageNet Acc (SFT) ImageNet Acc (Linear Probe)
VisionLLaMA-Large-MAE (ep800) 85.5 77.3

How to Use

Please refer the Github page for usage.

Citation

@article{chu2024visionllama,
  title={VisionLLaMA: A Unified LLaMA Interface for Vision Tasks},
  author={Chu, Xiangxiang and Su, Jianlin and Zhang, Bo and Shen, Chunhua},
  journal={arXiv preprint arXiv:2403.00522},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Unable to determine this model's library. Check the docs .

Dataset used to train mtgv/VisionLLaMA-Large-MAE