In recent years, research on reinforcement learning-based intelligent air combat strategies has become abundant; however, interpretability analyses of resulting models remain scarce. This paper addresses the following core issues: (A) how to automatically extract tactical knowledge from Reinforcement Learning (RL) air combat models; (B) what drives RL models in critical tactical maneuver decisions; (C) how to optimize intelligent strategy performance via model analyses. To this end, this paper proposes: (A) Normalized Temporal Probability Matrix (NTPM) and feature kernels to quantitatively characterize Tactical Strategy Patterns (TSPs); (B) a single-feature clustering algorithm based on optimal masked kernels, plus a multi-feature selection and ensemble method integrating diversity metrics and Shapley Additive explanations (SHAP) values, achieving clustering accuracy of 0.77 (surpassing state-of-the-art’s 0.72); (C) SHAP value analysis to identify features most impacting critical decision points; (D) a closed-loop pathway from interpretability to optimization via statistical analysis of “lethal states” in tactical strategy patterns. This study significantly enhances RL model interpretability, offering key references for theoretical and practical advancements in intelligent air combat decision-making.
Liu et al. (Sun,) studied this question.