What question did this study set out to answer?

The study aims to address how algorithmic collusion can be audited beyond traditional price traces through a new policy-graph framework.

May 26, 2026Open Access

Beyond Price Traces: Policy-Graph Audits for Algorithmic Collusion

Key Points

The study aims to address how algorithmic collusion can be audited beyond traditional price traces through a new policy-graph framework.
Introduced a policy-graph audit pipeline using trace-level EAD diagnostics.
Enumerated induced policy graphs and measured basin-weighted attractor risk.
Applied price-above-Nash objectives and post-training policy-graph repair.
8 out of 10 attractors identified were high-price attractors in N=2 settings.
Price-above-Nash objectives successfully shifted agents to Nash-class dominant graphs.
Post-training repairs eliminated high-price attractor basins with minimal Q-entry edits.

Abstract

Algorithmic collusion is usually audited from price traces: did prices rise, did firms move together, and did deviations trigger punishment? We argue that this is the wrong primary object for learned agents. Once deployed, a finite learned pricing policy induces a state-transition graph; its long-run regimes are attractors with basins. Collusion-like behavior is therefore a policy-geometry property, not only a trace-level anomaly. We introduce a policy-graph audit pipeline for learned market agents. The pipeline first uses trace-level EAD diagnosticsâ€”price elevation, cross-agent dependence, and shifted-null controlsâ€”to identify suspect regimes. It then freezes the learned policy, enumerates the induced policy graph, measures basin-weighted attractor risk, localizes mechanism with state ablations and Q-value forensics, and tests control by objective guardrails and post-training graph repair. A simple finite-state proposition shows that every deployed deterministic policy decomposes into attractor basins; the empirical question is which basins are supracompetitive and what sustains them. In tabular Bertrand pricing, the audit exposes facts that price traces alone hide. In the canonical N=2 setting, 1M-episode Q-learning policies induce a single global attractor over all 225 joint-price states in every seed; 8/10 are high-price attractors and 2/10 are intermediate. No-state agents, which cannot condition on competitor history at all, learn high-price global attractors in 10/10 seeds, falsifying a simple competitor-observability explanation in this benchmark. Q-value forensics show that learned greedy actions often differ from one-shot best responses and are favored by learned continuation values. For control, a price-above-Nash objective guardrail moves all tested full/no-state N=2,3,4 settings to Nash-class dominant graphs. More sharply, post-training policy-graph repair removes all high-price N=2 attractors with only 2â€“11 Q-entry edits per seed, followed by global verification that the high-price basin is gone. These results recast algorithmic collusion as an auditable and controllable attractor-level failure of learned market policies.

Beyond Price Traces: Policy-Graph Audits for Algorithmic Collusion

Key Points

Abstract

Cite This Study