Real world image classifiers frequently operate under unknown corruptions that degrade both accuracy and confidence in unpredictable ways. (1) This study evaluates the robustness of 37 neural network models for image classification under diverse corruptions through a black box stress test. The evaluated models include conventional and modern convolutional neural networks, such as AlexNet, ResNet, and Inception. They also include noise-robust variants, such as Noisy Student and AugMix-ResNet, and vision transformers such as ViT, DeiT and Swin. (2) Fifteen corruption types are applied to the ImageNet ILSVRC2012 validation set. These include common corruptions such as Gaussian, speckle and salt and pepper noise, as well as structured corruptions including random lines, random crosses and confusion blocks. (3) To distinguish accuracy degradation from shifts in model confidence, this work complements the Corruption Error (CE) metric with the proposed Accuracy Confidence Divergence (ACD), which summarizes the directional gap between accuracy and predicted confidence across corruption severities. CE uses AlexNet as the reference model, and larger CE values indicate greater accuracy degradation under corruption. Our results show that several corruptions degrade model performance more severely, especially structured corruptions such as random lines and random stripes horizontal, whose CE values exceed 0.50. Among common corruptions, salt and pepper noise exceeds 0.43, whereas the lower CE bounds for the others lie roughly between 0.24 and 0.33. These findings highlight distinct vulnerabilities in modern architectures and demonstrate the importance of extended robustness benchmarks that explicitly include structured and nonstandard corruption types.
Erkara et al. (Thu,) studied this question.