June 10, 2025

Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Key Points

Key points are not available for this paper at this time.

Abstract

Due to their multimodal capabilities, Vision-Language Models (VLMs) have found numerous impactful applications in real-world scenarios. However, recent studies have revealed that VLMs are vulnerable to image-based adversarial attacks. Traditional targeted adversarial attacks require specific targets and labels, limiting their real-world impact. We present AnyAttack, a self-supervised framework that transcends the limitations of conventional attacks through a novel foundation model approach. By pretraining on the massive LAION-400M dataset without label supervision, AnyAttack achieves unprecedented flexibility - enabling any image to be transformed into an attack vector targeting any desired output across different VLMs. This approach fundamentally changes the threat landscape, making adversarial capabilities accessible at an unprecedented scale. Our extensive validation across five open-source VLMs (CLIP, BLIP, BLIP2, InstructBLIP, and MiniGPT-4) demonstrates AnyAttack’s effectiveness across diverse multimodal tasks. Most concerning, Any-Attack seamlessly transfers to commercial systems including Google Gemini, Claude Sonnet, Microsoft Copilot and OpenAI GPT, revealing a systemic vulnerability requiring immediate attention.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jiaming Zhang

Junhong Ye

Xingjun Ma

Actions

Institutions

Fudan University

Hong Kong University of Science and Technology

Singapore Management University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study