Algorithmic collusion is usually audited from price traces: did prices rise, did firms move together, and did deviations trigger punishment? We argue that this is the wrong primary object for learned agents. Once deployed, a finite learned pricing policy induces a state-transition graph; its long-run regimes are attractors with basins. Collusion-like behavior is therefore a policy-geometry property, not only a trace-level anomaly. We introduce a policy-graph audit pipeline for learned market agents. The pipeline first uses trace-level EAD diagnostics—price elevation, cross-agent dependence, and shifted-null controls—to identify suspect regimes. It then freezes the learned policy, enumerates the induced policy graph, measures basin-weighted attractor risk, localizes mechanism with state ablations and Q-value forensics, and tests control by objective guardrails and post-training graph repair. A simple finite-state proposition shows that every deployed deterministic policy decomposes into attractor basins; the empirical question is which basins are supracompetitive and what sustains them. In tabular Bertrand pricing, the audit exposes facts that price traces alone hide. In the canonical N=2 setting, 1M-episode Q-learning policies induce a single global attractor over all 225 joint-price states in every seed; 8/10 are high-price attractors and 2/10 are intermediate. No-state agents, which cannot condition on competitor history at all, learn high-price global attractors in 10/10 seeds, falsifying a simple competitor-observability explanation in this benchmark. Q-value forensics show that learned greedy actions often differ from one-shot best responses and are favored by learned continuation values. For control, a price-above-Nash objective guardrail moves all tested full/no-state N=2,3,4 settings to Nash-class dominant graphs. More sharply, post-training policy-graph repair removes all high-price N=2 attractors with only 2–11 Q-entry edits per seed, followed by global verification that the high-price basin is gone. These results recast algorithmic collusion as an auditable and controllable attractor-level failure of learned market policies.
Chao Zhou (Thu,) studied this question.