When LLMs audit artifacts from the same semantic direction used to generate them, they re-enter the same compressed manifold region, producing rising confidence while discovery stalls --- a failure mode we term Generator-Auditor Symmetry (GAS). Orthogonal Auditing by Rotation (OAR) escapes this by selecting probe directions via cosine distance in embedding space, accumulating confirmed defect classes in a persistent registry, and stopping when the false-positive rate signals entropy exhaustion (P2 floor). Single-pass T2 experiments on eight external codebases (vLLM, LangChain, Gradio, MLflow, Superset, LiteLLM, Dify, Open WebUI) yield a 1.5--2.5x exclusive discovery advantage over same-axis baselines, with 85--100% non-overlap between rotation and repetition findings. A resource-equivalent controlled experiment (T13-multi) comparing OAR against 4 genuinely diverse-topic security prompts on the same 8 codebases yields a 6/8 win rate (mean Delta E = +1.125, mean Jaccard = 0.187), confirming that axis rotation provides manifold coverage distinct from topic diversity. Full-campaign rotation on a 350K-line production codebase (156+ sessions) produces a cumulative 3--5x yield advantage as the saturation curve compounds across axes. No single axis captures >30% of critical findings in any codebase. The method generalizes across Python, TypeScript, Java, and nine structurally disjoint application domains. Beyond empirical yield, OAR provides an epistemically grounded coverage framework for LLM-based auditing: each rotation makes a falsifiable behavioral claim about reducing the unseen vulnerability surface, a property no topic-diversity or prompt-variation strategy can replicate. (The claim is behavioral and operational --- grounded in the observed 85--100% non-overlap between rotation and repetition findings; it does not depend on confirmed activation-space measurement, which is the target of T4.) A vocabulary-matched controlled experiment (T1, 6/6 cells, 3 models) provides directionally consistent evidence that axis direction, not vocabulary specificity, drives the effect (pilot result; no per-cell p-values or power analysis; formal replication at scale is T1-ext). Persistent homology (T6) confirms the underlying manifold topology is generic to LLM semantic organization, not defect-specific --- predicting domain-general applicability. Session-touch count predicts defect density (r ≈ 0.71, p = 9.0), all from axis directions the same-axis baseline structurally cannot reach. The theoretical apparatus underlying these results --- 92 conjectures on LLM manifold geometry, the formal consequences of GAS and CCD, and the epistemological foundations of oracle-free auditing --- is developed in the companion paper.
Building similarity graph...
Analyzing shared references across papers
Loading...
Martin Brodeur
Building similarity graph...
Analyzing shared references across papers
Loading...
Martin Brodeur (Mon,) studied this question.
www.synapsesocial.com/papers/69e07e992f7e8953b7cbf72c — DOI: https://doi.org/10.5281/zenodo.19561632