Transferable adversarial examples (AEs) have attracted considerable attention due to their ability to expose vulnerabilities in black-box deep neural networks (DNNs). However, achieving superior transferability for targeted attacks remains a challenge. In this paper, inspired by the observation that AEs with smaller intra-class distances and larger inter-class distances tend to exhibit higher transferability, we propose a novel targeted attack based on Feature Contrastive Optimization (FCO). This attack enhances adversarial transferability by minimizing intra-class distances and maximizing inter-class distances. Specifically, we first define positive samples (belonging to the target class) and negative samples (belonging to non-target classes) that correspond to targeted AEs. Subsequently, leveraging these defined positive and negative samples, we propose two metrics—Intra-class Compactness (IC) and Inter-class Separability (IS)—to construct a novel Feature Contrastive (FC) loss. By integrating this plug-and-play FC loss into standard adversarial objectives, the generated AEs are encouraged to better align with the target class distribution while diverging from those of non-target classes. Extensive experiments on the ImageNet-compatible dataset demonstrate that our approach consistently improves targeted transferability across a broad range of DNN architectures.
Wang et al. (Sat,) studied this question.