What question did this study set out to answer?

The central aim is to assess the robustness of various neural network models against image corruptions.

April 24, 2026Open Access

Blind confusion of classification networks: A black box evaluation under common and structured image corruptions

Key Points

The central aim is to assess the robustness of various neural network models against image corruptions.
Evaluated 37 models, including convolutional networks and vision transformers.
Applied 15 types of image corruptions to the ImageNet validation set.
Introduced Accuracy Confidence Divergence to analyze performance discrepancies.
Structured corruptions severely degraded model performance, with CE values exceeding 0.50 in some cases.
Common corruption of salt and pepper noise caused CE values above 0.43.
Findings indicate significant vulnerabilities in modern neural network architectures.

Abstract

Real world image classifiers frequently operate under unknown corruptions that degrade both accuracy and confidence in unpredictable ways. (1) This study evaluates the robustness of 37 neural network models for image classification under diverse corruptions through a black box stress test. The evaluated models include conventional and modern convolutional neural networks, such as AlexNet, ResNet, and Inception. They also include noise-robust variants, such as Noisy Student and AugMix-ResNet, and vision transformers such as ViT, DeiT and Swin. (2) Fifteen corruption types are applied to the ImageNet ILSVRC2012 validation set. These include common corruptions such as Gaussian, speckle and salt and pepper noise, as well as structured corruptions including random lines, random crosses and confusion blocks. (3) To distinguish accuracy degradation from shifts in model confidence, this work complements the Corruption Error (CE) metric with the proposed Accuracy Confidence Divergence (ACD), which summarizes the directional gap between accuracy and predicted confidence across corruption severities. CE uses AlexNet as the reference model, and larger CE values indicate greater accuracy degradation under corruption. Our results show that several corruptions degrade model performance more severely, especially structured corruptions such as random lines and random stripes horizontal, whose CE values exceed 0.50. Among common corruptions, salt and pepper noise exceeds 0.43, whereas the lower CE bounds for the others lie roughly between 0.24 and 0.33. These findings highlight distinct vulnerabilities in modern architectures and demonstrate the importance of extended robustness benchmarks that explicitly include structured and nonstandard corruption types.

Blind confusion of classification networks: A black box evaluation under common and structured image corruptions

Key Points

Abstract

Cite This Study