We develop a method for quantifying the likelihood of observing collusive strategies among provably convergent decentralized multiagent reinforcement learning algorithms in a pricing setting. This is necessary to accurately assess the threat that colluding algorithms pose for society. The tools are, however, more generally applicable. Specifically, we obtain conditions for the weak acyclicity of families of two-player, symmetric Markov games in which best responses are unique. In this case, the individual best-response graphs (a concept we introduce in the article) belong to the class of functional relations. Using the structural properties of this class of graphs, we provide conditions on the individual best-response graphs for the game being weakly acyclic. In addition, we characterize the stationary distribution of the best-response strategy adjustment process in such games. Using these results, we show that Decentralized Q-learning is provably convergent in three two-player, two-action games with a memory of one period, analyze its probability of converging to different equilibria, and interpret the results in the context of algorithmic collusion.
Janusz M Meylahn (Fri,) studied this question.