In Sixth-Generation (6G) networks, which demand higher speed, lower delay, greater intelligence, and seamless global coverage, Quality of Service (QoS) aware routing becomes even more challenging. Existing approaches for heterogeneous services often depend on static path selection, centralized control, or limited prediction, making them unsuitable for highly dynamic environments. In particular, most existing multi-agent or graph-based routing methods rely on centralized training or implicit coordination mechanisms, or employ static graph representations that fail to capture dynamic inter-node dependencies and evolving traffic patterns. To address this gap, we introduce an online routing strategy based on a Deep Reinforcement Learning Multi-Agent Decision-making Algorithm (DRL-MDA) within the Software-Defined Networking (SDN) framework. Unlike conventional centralized or graph-embedding-based approaches, the proposed method adopts a fully decentralized paradigm, in which each agent performs hop-by-hop routing while being implicitly coordinated through an attention-driven global representation. The network is modeled as a graph environment, and routing is executed hop by hop by distributed agents. To improve learning efficiency and expressiveness, we design an Attention-Augmented Network Simulator (A2-Sim) that uses multi-head attention to capture correlations among nodes, links, and evolving traffic during offline training. This design explicitly decouples global traffic modeling from local decision-making, enabling agents to coordinate implicitly through shared latent representations rather than explicit message passing or centralized control. We evaluate delay-sensitive, throughput-sensitive, and delay-loss-sensitive services on GEANT and Abilene. Results show reductions of 20.0% in delay, 23.3% in loss ratio, and 14.3% throughput gain on GEANT. Real-time Mininet emulation further demonstrates feasibility in SDN infrastructures.
Zhang et al. (Fri,) studied this question.