Key points are not available for this paper at this time.
The proliferation of large language model (LLM) agents has enabled increasingly complex multi-step automation; however, composing multiple agents into coherent systems introduces significant orchestration challenges that remain poorly documented. This survey examines LLM-based multi-agent orchestration from 2023 through early 2026 (literature cutoff: March 2026), with explicit attention to the evidence hierarchy used to interpret deployment claims. We propose a three-topology, one-adaptivity taxonomy—centralized, decentralized, and hierarchical coordination topologies, each optionally augmented with a dynamic–adaptive control axis—grounded in classical multi-agent systems theory and recent empirical evidence. We compare six leading frameworks (LangGraph, CrewAI, AutoGen/Microsoft Agent Framework, OpenAI Agents SDK, MetaGPT, and DSPy) along axes directly relevant to practitioners: state-management granularity, token-cost structure, failure-recovery options, and design philosophy. The emerging protocol stack is examined in terms of why MCP (agent-to-tool) and A2A (agent-to-agent) occupy complementary layers, how the ACP–A2A merger signals protocol convergence, and where ANP’s decentralized-discovery design fits. Production design considerations—state management, task planning, error handling, scalability, and security—are evaluated with reference to published benchmarks. Vendor-reported figures are marked † throughout and held to a documented evidence hierarchy, which separates them from peer-reviewed and government-evaluator measurements. We close by identifying eight open challenges and proposing a six-dimension evaluation framework for multi-agent coordination quality. This paper offers practitioners a decision framework covering taxonomy, framework selection, protocol adoption, and early operational pilots.
Zhu et al. (Mon,) studied this question.