What type of study is this?

This is a Experimental Study study.

September 17, 2025

Exploiting feature-rich image locations for adversarial attacks on image classifiers without network access

Key Points

Adversarial attacks can target feature-rich areas to enhance effectiveness outside training networks.
The Multi-Targeted Gradient Training approach utilizes multiple pretrained classifiers to capture diverse features.
An order-based loss function optimizes training by focusing on the most salient pixels in the gradients.
The method shows promise for generating successful attacks that generalize across various unseen architectures.

Abstract

Physical adversarial attacks have advanced rapidly, with numerous methods developed to overcome the challenge of applying perturbations in real-world settings. However, less attention has been given to the challenge of information access. Most adversarial attacks operate in white-box settings or information-constrained black-box scenarios. Although prior work has explored universal adversarial examples and attacks without direct access to target networks, existing literature does not support the broad application of pre-existing adversarial methods in what we introduce as the "box-agnostic scenario". Unlike the black-box setting, which assumes access to both inputs and outputs of the target network, the box-agnostic scenario assumes knowledge only of the input image, with no access to classification outputs. To address this challenge, we introduce Multi-Targeted Gradient Training (MTGT), a novel approach that leverages encoder-decoder architectures trained on the combined gradients of multiple pretrained classifiers. By incorporating diverse architectures, MTGT captures a wide range of feature detectors, allowing feature-rich regions to emerge naturally during training. Additionally, we introduce a novel order-based loss function that optimizes training by emphasizing the most salient pixels in the combined gradients, guiding the network to focus on features most critical to successful attacks. This process enables the network to identify and exploit high-information areas within an image, facilitating adversarial attacks that target these regions rather than relying on any single network's gradients. We evaluate MTGT's effectiveness by testing its adversarial capabilities against networks outside the set used during training, demonstrating its potential for generating attacks that generalize across unseen architectures.

Bookmark

Exploiting feature-rich image locations for adversarial attacks on image classifiers without network access

Key Points

Abstract

Cite This Study