We introduce semantic overconfidence as the phenomenon where a model’s output probability remains invariant regardless of the presence or absence of a semantically strong but class-irrelevant features in the image. We adopt generative models to introduce such types of features and create three datasets of factual and counterfactual pairs to study model predictive probabilities. Our experiments indicate that neural networks indeed suffer from this type of semantic challenge. We also provide empirical evidence suggesting that Bayesian methods have the potential to alleviate this problem.
Dimitrakopoulos et al. (Sun,) studied this question.