Los puntos clave no están disponibles para este artículo en este momento.
Adversarial perturbations of normal images are usually imperceptible to, but they can seriously confuse state-of-the-art machine learning. What makes them so special in the eyes of image classifiers? In this, we show empirically that adversarial examples mainly lie in the low regions of the training distribution, regardless of attack types targeted models. Using statistical hypothesis testing, we find that modern density models are surprisingly good at detecting imperceptible image. Based on this discovery, we devised PixelDefend, a new approach purifies a maliciously perturbed image by moving it back towards the seen in the training data. The purified image is then run through unmodified classifier, making our method agnostic to both the classifier and attacking method. As a result, PixelDefend can be used to protect already models and be combined with other model-specific defenses. Experiments that our method greatly improves resilience across a wide variety of-of-the-art attacking methods, increasing accuracy on the strongest attack 63% to 84% for Fashion MNIST and from 32% to 70% for CIFAR-10.
Song et al. (Mon,) studied this question.