What question did this study set out to answer?

This research aims to explore vulnerabilities in large multimodal models (LMMs) against adversarial attacks and develop effective attack strategies.

May 21, 2026Open Access

Adaptive ensemble attack: breaking Large Multimodal Models via dynamic caption selection and weighted gradients

Key Points

This research aims to explore vulnerabilities in large multimodal models (LMMs) against adversarial attacks and develop effective attack strategies.
Introduced the Adaptive Ensemble PGD (AE-PGD) attack targeting multiple encoders simultaneously.
Utilized dynamic adversarial caption selection based on gradient metrics for attack optimization.
Implemented an Expectation over Transforms (EoT) gradient update to ensure resilience against defenses.
AE-PGD reduces baseline accuracy from 75.42% to 0.0% across all evaluation metrics.
Manifold analysis shows adversarial perturbations push embeddings to opposite areas of the space.
Achieved a 65 percentage point recall collapse on unseen encoders, highlighting attack effectiveness.

Abstract

Abstract Large Multimodal Models (LMMs) have achieved remarkable performance across vision-language tasks, yet their robustness against adversarial attacks remains critically underexplored. While LMMs are vulnerable to visual encoder attacks, they exhibit surprising resilience due to encoder diversity—attacks optimized for CLIP fail to transfer to EVA-CLIP, especially when textual context is provided. We introduce the Adaptive Ensemble PGD (AE-PGD) attack, which simultaneously targets both encoders through three key innovations: (1) dynamic adversarial caption selection , combining gradient magnitude with global semantic displacement to identify the most attack-effective caption per model; (2) an adaptive weight controller , dynamically balancing each encoder’s contribution using real-time loss, gradient norm, and confidence metrics; and (3) an Expectation over Transforms (EoT) gradient update ensuring robustness against input-transformation defenses. Evaluated on COCO 2014 images, AE-PGD reduces accuracy from a 75.42% baseline to 0.0% across all three evaluation metrics—visual encoding, image-to-text recall, and LLM answer recall—achieving complete model collapse. Manifold analysis confirms that adversarial perturbations push image embeddings to antipodal regions of the joint embedding space, activating semantically opposite concept clusters and producing structured hallucinations. WordNet WUP similarity analysis reveals a 33.5 percentage point semantic drop across the test set. AE-PGD causes state-of-the-art LMMs (LLaVA, Qwen-VL, GPT-4V) to catastrophically misidentify a bullet train as a “helicopter crash,” with strong black-box transfer yielding a 65 percentage point recall collapse on unseen encoders. This work exposes critical vulnerabilities in current LMM architectures and underscores the urgent need for ensemble-aware defense mechanisms.

AI에게 질문

Bookmark

View Full Paper

Cite This Study

Pandey et al. (Tue,) studied this question.

synapsesocial.com/papers/6a0ea15cbe05d6e3efb5ff0b https://doi.org/https://doi.org/10.1007/s00371-026-04480-4

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

AI에게 질문

Bookmark

View Full Paper