This paper addresses the pursuit control problem of multi-agent systems with orbital dynamics constraints in high-dimensional spaces, where the targets are governed by rule-compliant Keplerian motion. A so-called adaptive multi-agent dueling deep Q-network (AMA-DDQN) is designed, where AMA is used to dynamically adjust exploration policies and generate orbital viability rewards, and DDQN is employed to decouple state-value and action-advantage estimation for dimensionality reduction. By compared to Deep Deterministic Policy Gradient (DDPG), the proposed AMA-DDQN is much superior in convergence speed, orbital constraint compliance, and coordinated control stability.
Wang et al. (Fri,) studied this question.