What question did this study set out to answer?

This research aims to improve the interpretability and optimization of reinforcement learning models in air combat scenarios.

March 16, 2026Open Access

Mining and analysis of tactical strategy patterns from reinforcement learning model in close-range air combat

Puntos clave

This research aims to improve the interpretability and optimization of reinforcement learning models in air combat scenarios.
Developed a normalized temporal probability matrix and feature kernels to analyze tactical strategies.
Applied a clustering algorithm based on optimal masked kernels for feature analysis.
Utilized SHAP values for understanding influences on critical decision points.
Conducted statistical analysis of lethal states to establish a closed-loop pathway for optimization.
Achieved a clustering accuracy of 0.77, outperforming the previous state-of-the-art of 0.72.
Identified key features significantly impacting critical tactical decisions during air combat.
Demonstrated a structured approach linking model interpretability with strategy performance optimization.

Resumen

In recent years, research on reinforcement learning-based intelligent air combat strategies has become abundant; however, interpretability analyses of resulting models remain scarce. This paper addresses the following core issues: (A) how to automatically extract tactical knowledge from Reinforcement Learning (RL) air combat models; (B) what drives RL models in critical tactical maneuver decisions; (C) how to optimize intelligent strategy performance via model analyses. To this end, this paper proposes: (A) Normalized Temporal Probability Matrix (NTPM) and feature kernels to quantitatively characterize Tactical Strategy Patterns (TSPs); (B) a single-feature clustering algorithm based on optimal masked kernels, plus a multi-feature selection and ensemble method integrating diversity metrics and Shapley Additive explanations (SHAP) values, achieving clustering accuracy of 0.77 (surpassing state-of-the-art’s 0.72); (C) SHAP value analysis to identify features most impacting critical decision points; (D) a closed-loop pathway from interpretability to optimization via statistical analysis of “lethal states” in tactical strategy patterns. This study significantly enhances RL model interpretability, offering key references for theoretical and practical advancements in intelligent air combat decision-making.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo