Abstract Multi-Agent Reinforcement Learning (MARL) relies on accurate modeling of inter-agent interactions for effective coordination. Existing graph-based methods fail to infer dynamic coordination structures from high-level semantics and still rely on trajectory-based interaction patterns, leading to suboptimal policies and unstable credit assignment. Moreover, agents are often hindered by manually designed static rewards that fail to adapt to dynamic contexts and cannot address the challenges of sparse reward environments. To address these limitations, we propose the LLM-Guided Graph Neural Coordination Framework (LLM-GNCF), which establishes a systematic integration between Large Language Model semantic reasoning and Graph Neural Coordination. LLM-GNCF leverages an LLM to dynamically construct a Team-Adaptive Coordination Graph (TACG) based on real-time LLM-guided strategic semantics and actions, which are validated by the Tactical Critic model. This graph structure serves as an effective structural prior to guide effective localized information exchange at both the agent-level and the group-level, thus facilitating cooperative policy optimization. Crucially, we introduce an LLM-empowered latent reward shaping method supported by a Chain of Aggregation mechanism that aggregates multi-frame information to provide fine-grained, context-aware semantic feedback while managing computational overhead. This signal further enhances coordination and strategy refinement, particularly under sparse reward conditions. Furthermore, LLM-GNCF employs a two-stage training paradigm in which LLM guidance reduces blind exploration in MARL and accelerates training convergence. Experiments on the challenging StarCraft II micromanagement tasks suggest that the proposed integrated framework achieves competitive performance compared to representative baselines in terms of coordination efficiency and policy generalization.
Kuang et al. (Mon,) studied this question.