Existing visual tracking models generally suffer from insufficient robustness under adversarial attacks, especially in multi-frame continuous complex scenarios, where even small perturbations can lead to sustained bias in the model, severely affecting tracking performance. To address this issue, this paper proposes an adversarial defense method with receptive field enhancement fusion and denoising U-Net for visual tracking, named REF-DUNet, aimed at enhancing the robustness of the tracker in perturbed environments and mitigating interference caused by adversarial attacks. Based on the U-shaped encoder-decoder network architecture, this method designs a multi-branch receptive field enhancement fusion module, which enhances the ability to learn and preserve features against multi-scale adversarial perturbations by parallel fusing standard convolution, asymmetric convolution, and dilated convolutions. To improve the structural integrity and semantic consistency of the denoised image, REF-DUNet also jointly introduces mean squared error loss and perceptual loss during training, achieving collaborative optimization in both low-level and high-level feature spaces. It is worth noting that REF-DUNet does not require access to the network architecture or gradient information of the target tracker, demonstrating excellent generality, independence, and cross-model adaptability. We apply the REF-DUNet defense method to two representative trackers and conduct experiments on four well-known datasets, defending both white- and black-box attacks. The results show that our method significantly enhances the robustness of trackers under adversarial attacks and can effectively restore tracking performance.
Gao et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: