The lack of explainability in intermediate processing in convolutional neural network (CNN)-based wooden-house damage detection for Japan’s Earthquake Damage Certification (EDC) survey can undermine homeowners’ trust and hinder practical adoption. To address this issue, this research proposes an explainable, diagnosable ResNet-50 detector that uses: feature-map visualization modules (FVMs) to visualize feature-map representations; a human-attention alignment loss and human-attention masks that supervise the network to extract features attended to by human experts; and a corresponding diagnostic paradigm. The test results indicate that the explainability and human–machine alignment of the ResNet-50 detector are greatly improved, with only a minor performance loss. Unlike methods that rely solely on post hoc class activation maps to explain the final decision, the proposed method exposes the dynamic evolution of intrinsic feature maps throughout the backbone, thereby explaining and diagnosing the model's decision by tracing, from input to final detection, the features that are used, missed, or misused.
Wu et al. (Sun,) studied this question.