Same-axis LLM auditing fails structurally: the generating model and the auditing model share the same compression geometry, creating systematic blind spots no amount of same-direction prompting can escape. Self-consistency, chain-of-thought, query-by-committee, adversarial red-teaming, and multi-agent debate all operate within the same manifold region. They serve their intended purposes (hallucination reduction, reasoning transparency, calibration) but do not address axis-specific systematic omission. We call this Generator-Auditor Symmetry (GAS) — a structural tendency consistent with a compression-geometry hypothesis, not mere miscalibration. The method is a three-element compound system: (a) a prospective geometric separation criterion enforcing cosine distance > 0.6 between probe axes before execution; (b) a persistent cross-session causal defect class registry accumulating findings indexed by root mechanism; and (c) an entropy exhaustion stopping criterion derived from that registry. No individual element generates the completeness signal alone. The practitioner's role is Steward of the Axis: locus selection and coverage tracking. The LLM traverses; the human steers. Evidence. Controlled Tier-1 experiments (n=4): orthogonal probing yielded 39% greater lexical escape from saturated output; vocabulary-matched baseline confirmed the gain is driven by axis direction, not vocabulary specificity (6/6 cells, 3 models). Production campaign (T1b, 36-hour, 156+ probe waves, 75+ surfaces, 350,000-line TypeScript codebase): ~80% per-surface bug-class discovery yield vs. ~20% same-axis — a 4–5× advantage (single-codebase observation; T2 independent replication pre-registered and pending; patent claim lower bound: 3×). Cross-codebase pilot (April 6, 2026): OAR applied to psf/requests Python library (833 lines, zero inventor contribution) across 4 LLM families (Grok-3, Gemini 3 Flash, Perplexity sonar-pro, Mistral Large) yielded 29–52 unique defect classes per model vs. 11 same-axis baseline, confirming cross-language and cross-model generalization. Stopping criterion: any two of (i) >40% false-positive rate on a full 3-axis round; (ii) zero new critical findings on two consecutive waves; (iii) finding complexity collapsing to single-parameter variants — signals entropy exhaustion. The FP rate is the entropy meter; no external ground-truth oracle required. Session-touch count (git log) predicts P0 density (r≈0.71, n=18 feature families, 95% CI 0.36, 0.88, p<0.001). Persistent homology (Vietoris-Rips, 58 production bug classes) yields 20 significant β₁ features — illustrative only; operates on bug-class name embeddings, not LLM activation space. Empirical case rests on Tier 1 and Tier 2 alone. 1,158-class production defect taxonomy accumulated across 200+ surfaces. The paper highlights 12 falsifiable conjectures (C1–C9, C59, C62–C63) grounded in production data; the full conjecture set (91 total; C88–C91 added April 2026) is in the companion theoretical paper (philpapers.org/archive/BROTLM-3.pdf). Limitations acknowledged: single-codebase T1b observation; author-as-rater for P2 findings (T3 independent adjudication pre-registered); T2 cross-codebase replication pending; persistent homology illustrative only (T6 bootstrap pending); GAS mechanism inferred, not directly measured (T4 pending). Patent: Methods disclosed are the subject of two U.S. Provisional Patent Applications filed April 5–6, 2026 (No. 64/029,703 and related filing). Personal, academic, and non-commercial research use expressly permitted. Commercial licensing: contact admin@fluentlogic.org.
Building similarity graph...
Analyzing shared references across papers
Loading...
Martin Brodeur
Building similarity graph...
Analyzing shared references across papers
Loading...
Martin Brodeur (Mon,) studied this question.
www.synapsesocial.com/papers/69d895046c1944d70ce0609f — DOI: https://doi.org/10.5281/zenodo.19446707