Deep neural networks are vulnerable to transferable adversarial examples in black-box scenarios, and targeted attacks that mislead models into predicting specific classes pose particularly severe threats. Feature mixup attacks enhance adversarial transferability by injecting clean features to perturb intermediate representations, yet existing methods suffer from two fundamental limitations: coarse-grained global mixing strategies apply a shared mixing ratio uniformly across all spatial positions, fixing the clean reference for each position to its corresponding location in the clean image and thus limiting the diversity of mixed feature representations during optimization and the transferability of the generated adversarial examples; and standard momentum-based optimization over-aligns with the surrogate model’s gradient geometry, suppressing gradient variations essential for escaping model-specific local minima. We propose Fine-grained Feature Mixup Perturbation and Reference-based Gradient Refinement (FMGR) to address both limitations. FFM partitions feature maps into spatially disjoint blocks and mixes each block with clean features drawn from spatially shuffled positions of the same image, breaking the fixed spatial correspondence of clean references and producing more generalizable feature perturbations. RGR selectively amplifies deviations between instantaneous and reference gradients, suppressing surrogate-specific dominant directions and steering optimization toward flatter, more transferable loss regions. Extensive experiments on ImageNet and CIFAR-10 demonstrate that FMGR significantly outperforms state-of-the-art methods against both CNN and Vision Transformer architectures while maintaining computational efficiency.
Gao et al. (Mon,) studied this question.