Anticipating opponents’ intentions in multi-agent systems is critical for rapid, robust decision-making in domains from autonomous unmanned aerial vehicle (UAV) coordination to competitive strategy games. However, most existing methods rely on unrealistic access to private opponent information or fail to capture the recursive reasoning humans use to adapt in dynamic, partially observable environments. We address these gaps with a hierarchical world-opponent modeling framework that unifies environment dynamics prediction and intention-strategy reasoning in a single architecture, without requiring private data. Inspired by human social inference, our method uses multiple learnable intention and strategy queries over local observations to recursively update opponent models, anticipate future trajectories, and adapt strategies in real time. Joint optimization of the world and opponent models captures the mutual influence between environment transitions, intentions, and maneuvers, yielding sample-efficient learning. Across benchmarks, including close-range multi-UAV engagements and the StarCraft Multi-Agent Challenge, our approach achieves up to 5.3 times faster learning than model-free multi-agent reinforcement learning baselines, while consistently improving maneuver effectiveness and decision intelligence. These results demonstrate a scalable, high-efficiency solution for adversarial reasoning in complex multi-agent cooperative-competitive settings.
Cheng et al. (Tue,) studied this question.