What question did this study set out to answer?

This research aims to improve coordination in multi-agent systems by integrating large language models with graph neural networks.

June 10, 2026Open Access

LLM-guided graph neural coordination framework for cooperative multi-agent reinforcement learning

Puntos clave

This research aims to improve coordination in multi-agent systems by integrating large language models with graph neural networks.
Developed LLM-Guided Graph Neural Coordination Framework (LLM-GNCF) for dynamic agent coordination.
Implemented a team-adaptive coordination graph leveraging real-time LLM guidance.
Introduced a two-stage training paradigm to enhance exploration in MARL settings.
LLM-GNCF outperformed baseline methods in coordination efficiency and policy generalization on StarCraft II tasks.
Dynamic reward shaping improved agent adaptability in sparse reward environments.
Reduced training convergence time due to LLM guidance and structured reward feedback.

Resumen

Abstract Multi-Agent Reinforcement Learning (MARL) relies on accurate modeling of inter-agent interactions for effective coordination. Existing graph-based methods fail to infer dynamic coordination structures from high-level semantics and still rely on trajectory-based interaction patterns, leading to suboptimal policies and unstable credit assignment. Moreover, agents are often hindered by manually designed static rewards that fail to adapt to dynamic contexts and cannot address the challenges of sparse reward environments. To address these limitations, we propose the LLM-Guided Graph Neural Coordination Framework (LLM-GNCF), which establishes a systematic integration between Large Language Model semantic reasoning and Graph Neural Coordination. LLM-GNCF leverages an LLM to dynamically construct a Team-Adaptive Coordination Graph (TACG) based on real-time LLM-guided strategic semantics and actions, which are validated by the Tactical Critic model. This graph structure serves as an effective structural prior to guide effective localized information exchange at both the agent-level and the group-level, thus facilitating cooperative policy optimization. Crucially, we introduce an LLM-empowered latent reward shaping method supported by a Chain of Aggregation mechanism that aggregates multi-frame information to provide fine-grained, context-aware semantic feedback while managing computational overhead. This signal further enhances coordination and strategy refinement, particularly under sparse reward conditions. Furthermore, LLM-GNCF employs a two-stage training paradigm in which LLM guidance reduces blind exploration in MARL and accelerates training convergence. Experiments on the challenging StarCraft II micromanagement tasks suggest that the proposed integrated framework achieves competitive performance compared to representative baselines in terms of coordination efficiency and policy generalization.

Me gusta

Guardar

Ver artículo completo

Cite This Study

Kuang et al. (Mon,) studied this question.

synapsesocial.com/papers/6a28ffc76f82f25be989c9b3 https://doi.org/https://doi.org/10.1007/s40747-026-02356-7

Me gusta

Guardar

Ver artículo completo