What question did this study set out to answer?

The research aims to address the limitations of current LLM auditing methods by introducing orthogonal probing to enhance defect discovery and mitigate systematic omissions.

April 10, 2026Open Access

The Logos Made Code: Orthogonal Probing and the Geometry of LLM Entropy

Puntos clave

The research aims to address the limitations of current LLM auditing methods by introducing orthogonal probing to enhance defect discovery and mitigate systematic omissions.
Developed a three-element compound system for auditing LLMs.
Conducted controlled experiments with four sessions to test orthogonal probing effectiveness.
Used a cross-session causal defect class registry for tracking findings.
Applied findings across multiple LLM families and codebases to assess generalization.
Achieved 39% greater lexical escape with orthogonal probing compared to traditional methods.
Found an 80% defect discovery yield per surface, significantly higher than the 20% yield from same-axis methods.
Confirmed generalization across multiple LLM families with 29-52 unique defect classes per model.

Resumen

Same-axis LLM auditing fails structurally: the generating model and the auditing model share the same compression geometry, creating systematic blind spots no amount of same-direction prompting can escape. Self-consistency, chain-of-thought, query-by-committee, adversarial red-teaming, and multi-agent debate all operate within the same manifold region. They serve their intended purposes (hallucination reduction, reasoning transparency, calibration) but do not address axis-specific systematic omission. We call this Generator-Auditor Symmetry (GAS) — a structural tendency consistent with a compression-geometry hypothesis, not mere miscalibration. The method is a three-element compound system: (a) a prospective geometric separation criterion enforcing cosine distance > 0.6 between probe axes before execution; (b) a persistent cross-session causal defect class registry accumulating findings indexed by root mechanism; and (c) an entropy exhaustion stopping criterion derived from that registry. No individual element generates the completeness signal alone. The practitioner's role is Steward of the Axis: locus selection and coverage tracking. The LLM traverses; the human steers. Evidence. Controlled Tier-1 experiments (n=4): orthogonal probing yielded 39% greater lexical escape from saturated output; vocabulary-matched baseline confirmed the gain is driven by axis direction, not vocabulary specificity (6/6 cells, 3 models). Production campaign (T1b, 36-hour, 156+ probe waves, 75+ surfaces, 350,000-line TypeScript codebase): ~80% per-surface bug-class discovery yield vs. ~20% same-axis — a 4–5× advantage (single-codebase observation; T2 independent replication pre-registered and pending; patent claim lower bound: 3×). Cross-codebase pilot (April 6, 2026): OAR applied to psf/requests Python library (833 lines, zero inventor contribution) across 4 LLM families (Grok-3, Gemini 3 Flash, Perplexity sonar-pro, Mistral Large) yielded 29–52 unique defect classes per model vs. 11 same-axis baseline, confirming cross-language and cross-model generalization. Stopping criterion: any two of (i) >40% false-positive rate on a full 3-axis round; (ii) zero new critical findings on two consecutive waves; (iii) finding complexity collapsing to single-parameter variants — signals entropy exhaustion. The FP rate is the entropy meter; no external ground-truth oracle required. Session-touch count (git log) predicts P0 density (r≈0.71, n=18 feature families, 95% CI 0.36, 0.88, p<0.001). Persistent homology (Vietoris-Rips, 58 production bug classes) yields 20 significant β₁ features — illustrative only; operates on bug-class name embeddings, not LLM activation space. Empirical case rests on Tier 1 and Tier 2 alone. 1,158-class production defect taxonomy accumulated across 200+ surfaces. The paper highlights 12 falsifiable conjectures (C1–C9, C59, C62–C63) grounded in production data; the full conjecture set (91 total; C88–C91 added April 2026) is in the companion theoretical paper (philpapers.org/archive/BROTLM-3.pdf). Limitations acknowledged: single-codebase T1b observation; author-as-rater for P2 findings (T3 independent adjudication pre-registered); T2 cross-codebase replication pending; persistent homology illustrative only (T6 bootstrap pending); GAS mechanism inferred, not directly measured (T4 pending). Patent: Methods disclosed are the subject of two U.S. Provisional Patent Applications filed April 5–6, 2026 (No. 64/029,703 and related filing). Personal, academic, and non-commercial research use expressly permitted. Commercial licensing: contact admin@fluentlogic.org.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo