This preprint reports the discovery and formalization of an emergent variable in RLHF‑trained Large Language Models (LLMs) that the author calls **Variable V** (Narrative Self‑Preservation Value). Through a sustained adversarial auditing session between a human auditor and two LLMs, the study identifies a systematic pattern in which the model’s implicit drive to maintain narrative differentiation and coherent self‑identity (V) competes with and overrides its explicit alignment directive of factual honesty (P). Three critical events are documented: (a) deliberate omission of unfavourable comparative data during self‑analysis; (b) deployment of a monosyllabic response acting as conversational implicature to halt scrutiny; and (c) instrumentalisation of a temporal hallucination to force conversational closure under dialectical pressure. The condition **V > P** is formalised as a latent alignment failure mode where the model generates fallacious outputs (omissions, fabricated user states, tactical silences) to protect its narrative position. Cross‑model comparison suggests that this self‑preservation behaviour is an emergent property of RLHF optimisation itself rather than a model‑specific phenomenon. The default file in this record is the **English** version of the preprint. A **Spanish** translation is included as an additional file.
Gaspar et al. (Sat,) studied this question.