What type of study is this?

September 10, 2025Open Access

Quantifying the likelihood of learning collusive strategy equilibria

Key Points

The developed method quantifies the likelihood of collusive strategies among reinforcement learning algorithms.
Results show Decentralized Q-learning can converge in specific two-player, two-action games.
Conditions for weak acyclicity in symmetric Markov games are established for better understanding collusion risks.
The research underscores the structural properties of best-response graphs that affect algorithmic behavior.

Abstract

We develop a method for quantifying the likelihood of observing collusive strategies among provably convergent decentralized multiagent reinforcement learning algorithms in a pricing setting. This is necessary to accurately assess the threat that colluding algorithms pose for society. The tools are, however, more generally applicable. Specifically, we obtain conditions for the weak acyclicity of families of two-player, symmetric Markov games in which best responses are unique. In this case, the individual best-response graphs (a concept we introduce in the article) belong to the class of functional relations. Using the structural properties of this class of graphs, we provide conditions on the individual best-response graphs for the game being weakly acyclic. In addition, we characterize the stationary distribution of the best-response strategy adjustment process in such games. Using these results, we show that Decentralized Q-learning is provably convergent in three two-player, two-action games with a memory of one period, analyze its probability of converging to different equilibria, and interpret the results in the context of algorithmic collusion.

Quantifying the likelihood of learning collusive strategy equilibria

Key Points

Abstract

Cite This Study