What question did this study set out to answer?

This research aims to explore the effectiveness of axis-rotation auditing for improving the detection of issues in LLM-generated artifacts.

April 16, 2026Open Access

Orthogonal Probing and the Geometry of LLM Entropy: Empirical Evidence for Axis-Rotation Auditing of LLM-Generated Artifacts

Key Points

This research aims to explore the effectiveness of axis-rotation auditing for improving the detection of issues in LLM-generated artifacts.
Implemented Orthogonal Auditing by Rotation (OAR) to select probe directions via cosine distance in embedding space.
Conducted single-pass T2 experiments on eight external codebases to measure discovery advantage over same-axis methods.
Performed a controlled experiment comparing OAR against diverse-topic security prompts across the same codebases.
Achieved a 1.5--2.5x exclusive discovery advantage over same-axis baselines with 85--100% non-overlap between findings.
Confirmed a 6/8 win rate for OAR in comparing effectiveness with diverse-topic prompts, indicating superiority in manifold coverage.
Produced a cumulative 3--5x yield advantage on a 350K-line production codebase, with no single axis capturing over 30% of critical findings.

Abstract

When LLMs audit artifacts from the same semantic direction used to generate them, they re-enter the same compressed manifold region, producing rising confidence while discovery stalls --- a failure mode we term Generator-Auditor Symmetry (GAS). Orthogonal Auditing by Rotation (OAR) escapes this by selecting probe directions via cosine distance in embedding space, accumulating confirmed defect classes in a persistent registry, and stopping when the false-positive rate signals entropy exhaustion (P2 floor). Single-pass T2 experiments on eight external codebases (vLLM, LangChain, Gradio, MLflow, Superset, LiteLLM, Dify, Open WebUI) yield a 1.5--2.5x exclusive discovery advantage over same-axis baselines, with 85--100% non-overlap between rotation and repetition findings. A resource-equivalent controlled experiment (T13-multi) comparing OAR against 4 genuinely diverse-topic security prompts on the same 8 codebases yields a 6/8 win rate (mean Delta E = +1.125, mean Jaccard = 0.187), confirming that axis rotation provides manifold coverage distinct from topic diversity. Full-campaign rotation on a 350K-line production codebase (156+ sessions) produces a cumulative 3--5x yield advantage as the saturation curve compounds across axes. No single axis captures >30% of critical findings in any codebase. The method generalizes across Python, TypeScript, Java, and nine structurally disjoint application domains. Beyond empirical yield, OAR provides an epistemically grounded coverage framework for LLM-based auditing: each rotation makes a falsifiable behavioral claim about reducing the unseen vulnerability surface, a property no topic-diversity or prompt-variation strategy can replicate. (The claim is behavioral and operational --- grounded in the observed 85--100% non-overlap between rotation and repetition findings; it does not depend on confirmed activation-space measurement, which is the target of T4.) A vocabulary-matched controlled experiment (T1, 6/6 cells, 3 models) provides directionally consistent evidence that axis direction, not vocabulary specificity, drives the effect (pilot result; no per-cell p-values or power analysis; formal replication at scale is T1-ext). Persistent homology (T6) confirms the underlying manifold topology is generic to LLM semantic organization, not defect-specific --- predicting domain-general applicability. Session-touch count predicts defect density (r ≈ 0.71, p = 9.0), all from axis directions the same-axis baseline structurally cannot reach. The theoretical apparatus underlying these results --- 92 conjectures on LLM manifold geometry, the formal consequences of GAS and CCD, and the epistemological foundations of oracle-free auditing --- is developed in the companion paper.

Read Full Paperexternally

AIに質問

Bookmark

View Full Paper