Status: NeurIPS 2026 submission under double-blind review. Author identity anonymized. Can a causal safety audit designed for one LLM transfer to others? We study cognitive policy oscillation, map an instability phase diagram (384 synthetic + 32 LLM conditions, sharp boundary at h approximately 0.2), and implement a three-component causal audit (WhyLab: C1 drift + C2 E-value filter + C3 Lyapunov damping). Cross-model evaluation of the fixed C2 audit across six LLM families (Gemini 2.0/2.5 Flash, GPT-4o-mini, Llama 3:8b, Llama 3.1:8b, Dolphin-Llama3:8b) on an identical adversarial fact-tracking benchmark. Only the Gemini family shows regression reduction (Gemini 2.0: +20.9%, p=0.088; Gemini 2.5: +100% underpowered). GPT-4o-mini, all Llama 3 variants, and Dolphin-Llama3 show null or negative audit effect. Paired accuracy reduced on all six models (Cohen's d from -0.46 to -1.09). Rejection rate mechanism: spans 0.5 to 13.75 per trajectory across models (27x spread) for an unchanged filter threshold, identifying per-model threshold calibration as the binding deployment constraint. Headline correction: previously reported 44% regression reduction on Gemini 2.0 Flash corrected to 20.9% under paired reanalysis of all 20 seeds; no longer Bonferroni-significant after six-model family adjustment. Contributions: (1) instability phase diagram for self-improving LLM agents, (2) cross-model reproducibility evaluation of a causal safety audit, (3) rejection-rate mechanism analysis identifying per-model calibration as binding constraint. Change log (v3 vs v2): Expanded E7v2 benchmark to six LLM families; 60 paired seeds total; headline number corrected (44% -> 20.9%); abstract / introduction / conclusion rewritten for cross-model framing; phase diagram preserved as primary scientific artifact; Codex anonymization blockers removed.
Building similarity graph...
Analyzing shared references across papers
Loading...
Anonymous Author
American Foundation for the Blind
Building similarity graph...
Analyzing shared references across papers
Loading...
Anonymous Author (Wed,) studied this question.
www.synapsesocial.com/papers/69eb092b553a5433e34b3b49 — DOI: https://doi.org/10.5281/zenodo.19687891