Key points are not available for this paper at this time.
Optimization of a group of traffic signals over an area is a large, multi-agent-type real-time planning problem without a precise reference model being given. To do this planning, each signal should learn not only to acquire its control plans individually through reinforcement learning, but also to cooperate with other signals. These two objectives-distributed learning of agents and cooperation among agents-conflict with each other, and a method that blends these two objectives together is required. In the method proposed in this paper, these two objectives correspond to localized reinforcement learning and global combinatorial optimization, respectively, and the method thus achieves cooperation in the long term without bothering with autonomy. The outline of the idea is as follows: each agent performs reinforcement learning and reports its cumulative performance evaluation, and combinatorial optimization is simultaneously carried out to find appropriate parameters for long-term learning that maximize the total profit of the signals (agents).>
Mikami et al. (Tue,) studied this question.