A recent empirical study (Gao and Xiao, 2026) deployed 150 autonomous Claude Code agents to test six market quality hypotheses on identical NYSE TAQ data. The authors frame their headline finding as dispersion—nonstandard errors arising from agent-to-agent variation in analytical choices. This framing misidentifies the governance-relevant contribution. Dispersion from underspecified tasks is expected, bounded, and already well-documented in the human researcher literature. The paper's actual discovery is different: model families exhibit stable, foreseeable, architecture-dependent analytical priors that the authors term "empirical styles." Sonnet agents chose autocorrelation for market efficiency 87% of the time across 100 independent runs. Opus agents chose variance ratio 100% of the time. These preferences are not stochastic noise. They are systematic priors embedded in model weights, invisible to operators, and undisclosed by providers. This paper argues that model-family empirical style meets the definitional bar for a governance variable—stable, foreseeable, materially outcome-affecting, operator-inaccessible, and provider-undisclosed—and derives the substrate-layer governance requirements that follow. It further argues that the AI-as-evidence category error is the conceptual frame that has prevented the field from recognizing this implication. The structural records versus interpretive outputs distinction is identified as the epistemological boundary that makes substrate-layer governance possible and defines the limit of what governed AI tools can legitimately produce.
Narnaiezzsshaa Truong (Wed,) studied this question.