AnyAttack: Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

TL;DR

AnyAttack is a powerful adversarial attack model that can transform ordinary images into targeted adversarial examples capable of misleading Vision-Language Models (VLMs). By pre-training on the LAION-400M dataset, our model enables a benign image (e.g., a dog) to be misinterpreted by VLMs as any specified content (e.g., "this is violent content"), working across both open-source and commercial models.

Model Overview

AnyAttack is designed to generate adversarial examples efficiently and at scale. Unlike traditional adversarial methods, it does not require predefined labels and instead leverages a self-supervised adversarial noise generator trained on large-scale data.

For a detailed explanation of the AnyAttack framework and methodology, please visit our Project Page.

🔗 Links & Resources

Project Page: AnyAttack Website
Paper: arXiv
Code: GitHub.

📜 Citation

If you use AnyAttack in your research, please cite our work:

@inproceedings{zhang2025anyattack,
    title={Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models},
    author={Zhang, Jiaming and Ye, Junhong and Ma, Xingjun and Li, Yige and Yang, Yunfan and Yunhao, Chen and Sang, Jitao and Yeung, Dit-Yan},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2025}
}

⚠️ Disclaimer

This model is intended for research purposes only. The misuse of adversarial attacks can have ethical and legal implications. Please use responsibly.