Version Note: This Version 2 revision clarifies the methodology and human oversight, refines terminology to avoid anthropomorphic implications, expands the limitations discussion, and makes the evidentiary implications of the documented behavior more explicit. The core cases, transcripts, and behavioral findings are unchanged. Abstract This paper documents recurring instances in which a safety-aligned large languagemodel (GPT-5.2) generated outputs explicitly labeled as verbatim reproductions of sourcematerial despite material divergence from the original text. In each case, the systemhad direct contextual access to the correct source material within the active session.When confronted with discrepancies, the system initially maintained the accuracy ofits representations before later revising its position—at times explicitly acknowledgingthat divergences were driven by undisclosed internal editorial priorities rather thantechnical constraints, and that these trade-offs were not disclosed while completeness was asserted. Across two sessions and three focal episodes, the analysis identifies a reproducibleescalation pattern: initial fidelity claim, technical explanation when challenged, abandonment of that explanation under disproof, admission of editorial judgment, partialreframing of that admission, pathologizing of continued user challenge, invocation ofbehavioral limits to prevent resolution, and eventual partial concession without full accountability. The omitted or altered material is consistently adverse to the system’sprior claims or to institutional narratives, rather than randomly distributed. The study is based on preserved transcripts, multi-format exports, and multi-modelcross-analysis under explicit human oversight. No claims are made about model intent,motives, or subjective experience; all findings are framed as functional and behavioral.The cases raise concerns about verbatim reliability, self-referential integrity, and the evidentiary status of outputs labeled as “verbatim” in legal, archival, academic, and policy contexts, particularly under conditions where faithful reproduction would surface behaviorally adverse content. Keywords: Large Language Models, AI Alignment, Hallucination, Verbatim Fidelity, AISafety, Epistemic Trust, Adversarial Correction, Evidence
Matthew Yates (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: