The application of deep learning models in resource-constrained environments requires a trade-off among accuracy, efficiency, and interpretability a trilemma frequently neglected in conventional knowledge distillation (KD) approaches. This paper introduces an interpretable knowledge distillation framework that transcends these goals through three innovations: (1) multi-granular semantic alignment for hierarchical feature structure preservation, (2) attention-gated distillation to impose spatial reasoning consistency, and (3) concept activation preservation for human-interpretable decision logic. Measured on CIFAR-10 and CUB-200-2011 benchmark datasets, our method obtains 92% accuracy (99. 57% retention) and 86. 94% accuracy (89. 54% retention), respectively, with 0. 576 average saliency similarity to the teacher model. By lowering the complexity of the model by 13. 7 compression ratio without forgoing interpretability, our method facilitates the deployment of interpretable, resource-efficient models in safety-critical settings like medical diagnosis and ecological surveillance. This paper closes the performance explainability gap, pushing the frontiers of reliable AI for edge computing. This framework maintains competitive performance relative to dominant baselines, whilst also optimizing all three aspects of the trilemma, something which prior work has not tackled.
Singh et al. (Tue,) studied this question.