Abstract Adaptive traffic signal control (ATSC) is at the core of intelligent transportation systems. Properly calibrated signal plans can alleviate bottlenecks preventing mounting congestion, while dysfunctional ones waste valuable public and private resources.Reinforcement Learning (RL) based controllers excel in efficiency and cost savings; they react online and adapt those reactions as more data becomes available. Despite the advantages, incumbent systems can manage hundreds of intersections while RL-based systems face challenges to scale up. The limitation is the curse of dimensionality and its intrinsic from the theoretical framework underpinning RL-based controllers; Markov decision processes (MDP). The explosion from the state space renders single-agent RL-ATSC systems unviable at large scales. Such a state of affairs requires that the computation be distributed across the network through a multi-agent reinforcement learning (MARL) system to search for a local (or global) optimum joint policy. Suppose the agents act independently by optimizing their policy. In that case, the training environment might become unstable, effectively preventing any learning from happening, i.e, the environment becomes non-stationary. Coordination mechanisms provide the means to learn at the inter-agent level or across the network, mitigating the non stationarity of the environment and allowing agents to search for optimal policies. Moreover, this thesis considers the distributed learning paradigm where agents are geographically scattered, can communicate and interact only with the neighbors. The thesis addresses the challenge of scaling up by developing coordination mechanisms that can handle hundreds of intersections. The first part is a methodologydeveloped on previous works aimed at policy optimization and evaluation of reinforcement learning-based approaches for ATSC. The second part develops algorithms where network agents can learn with partial information of their surroundings usingcommunication and interacting with neighbors. They stem from a class of consensus based algorithms that have convergence guarantees. This thesis’s third and final part investigates a decentralized algorithm for decoupling vast transportation networks into multiple regions that can be solved (almost) independently. This divide-and-conquer approach leverages the fact that agents on transportation networks are distant.
Guilherme S. Varela (Tue,) studied this question.