Key points are not available for this paper at this time.
Recent work in adversarial robustness suggests that natural data distributions are localized, i. e. , they place high probability in small volume regions of the input space, and that this property can be utilized for designing classifiers with improved robustness guarantees for ₂-bounded perturbations. Yet, it is still unclear if this observation holds true for more general metrics. In this work, we extend this theory to ₀-bounded adversarial perturbations, where the attacker can modify a few pixels of the image but is unrestricted in the magnitude of perturbation, and we show necessary and sufficient conditions for the existence of ₀-robust classifiers. Theoretical certification approaches in this regime essentially employ voting over a large ensemble of classifiers. Such procedures are combinatorial and expensive or require complicated certification techniques. In contrast, a simple classifier emerges from our theory, dubbed Box-NN, which naturally incorporates the geometry of the problem and improves upon the current state-of-the-art in certified robustness against sparse attacks for the MNIST and Fashion-MNIST datasets.
Pal et al. (Thu,) studied this question.