The deployment scenarios often include conditions not anticipated during training. Therefore, out-of-distribution (OOD) detection is essential for ensuring the reliability and security of neural networks. However, many existing OOD detectors suffer from instability, with performance degrading significantly when the dataset or model changes. This challenge highlights the need to approach OOD detection by examining intrinsic differences between in-distribution (ID) and OOD samples in terms of model capacities, rather than relying on their observable characteristics. In this paper, we propose Gradient-based Attribution Reliability for OOD Detection (GAROD), a novel method grounded in the capacity of invariance to irrelevant inputs, an important property linked to model generalization. We hypothesize that models exhibit such properties with ID samples, and samples for which the model lacks this invariance are classified as OOD. Specifically, GAROD leverages gradient-based attribution to separate relevant and irrelevant pixels in the input samples and observes how a model’s decisions change after removing irrelevant pixels. The approach most closely related to ours is attribution reliability evaluation (e.g., Insertion or Deletion metrics). However, these methods have never been applied to OOD detection. Moreover, directly using classical reliability metrics does not yield effective results. We identify two key issues: (1) model outputs are insufficient to capture decision changes effectively, and (2) using Insertion or Deletion metrics individually lacks comprehensiveness. In GAROD, we address these by observing final features instead, fusing both metrics to achieve robust OOD detection. Extensive experiments on CIFAR and ImageNet benchmarks demonstrate GAROD’s superiority over state-of-the-art post-hoc methods, as well as its resilience to performance degradation under dataset/model variations. Code: https://github.com/iceshade000/GAROD .
Zheng et al. (Sat,) studied this question.