Key points are not available for this paper at this time.
Although Deep Neural Networks (DNNs) have established a solid place for themselves in different applications, their mysterious inner working impedes their usage in sensitive applications. Interpretability-based methods try to overcome this issue by providing explanations for the models. Saliency Guided Training (SGT) is such method that directs the model's focus toward the most relevant features. This technique enhances the clarity of saliency maps, aiding in a better understanding of the model's decision-making. This research investigates the robustness SGT algorithm against adversarial attacks. Although saliency-guided training promises enhanced interpretability for a reliable application of DNNs, our investigation shows that this method increases the model's vulnerability against adversarial attacks. This study underscores the pressing necessity for researchers to achieve an equilibrium between clarity of interpretation and defense against adversarial interventions. Also, the outcome shows a need for attention when deploying saliency-based DNNs in different applications. We employ diverse architectures such as a conventional CNN, ResNet-18, and the Tiny Transformer on popular datasets such as MNIST, CIFAR-10, CIFAR-100, and Caltech101 to substantiate our conclusion.
Karkehabadi et al. (Fri,) studied this question.