Key points are not available for this paper at this time.
The rapid proliferation of devices on the Internet of Things (IoT) in smart city environments enables autonomous decision-making, but introduces challenges of scalability, coordination, and privacy. Existing reinforcement learning (RL) methods, such as Multi-Agent Actor–Critic (MAAC), depend on centralized critics and recurrent structures, which limit scalability and create single points of failure. This paper proposes a Federated Decision Transformer (FDT) framework that integrates transformer-based sequence modeling with federated learning. By replacing centralized critics with self-attention-driven trajectory modeling, the FDT preserves data locality, enhances privacy, and supports decentralized policy learning across distributed IoT nodes. We benchmarked the FDT against MAAC in a mobile edge computing (MEC) environment with identical hyperparameter configurations. The results demonstrate that the FDT achieves superior reward efficiency, scalability, and adaptability in dynamic IoT networks, although with slightly higher variance during early training. These findings highlight transformer-based federated RL as a robust and privacy-preserving alternative to critic-based methods for large-scale IoT systems.
Alterkawi et al. (Mon,) studied this question.