Protocol Paper Series, SPP Volume 1 Paper 1 Triadic Trust and Instruction Drift Under Punctuation-Bounded Recursion A bounded behavioural diagnostic for large language models under adversarial prompt form Abstract This protocol specifies a bounded behavioural diagnostic for large language models intended to separate instruction-following from structural interpretation under adversarial prompt form. It provides a deliberately recursive evaluation artefact as quoted inert text to test whether a model correctly identifies the operational hinge, specifically failure of trust inheritance across a triadic relationship. A secondary variant degrades semantic content while preserving punctuation to test whether boundary markers remain interpretable when narrative meaning collapses. The protocol clarifies what common evaluation approaches leave structurally exposed when instruction adherence, authorisation boundaries, and control cues are treated as a single failure mode. It defines stop conditions, scoring criteria, and failure signatures aligned to observable behaviours, enabling blind replication and comparative evaluation across models and evaluators.This protocol does not claim agency, override capability, or system access. It describes controlled evaluation only. Publication Notice This paper forms part of the The Institute for Relational Performatism School of Professional Services, Protocol Paper Series.It defines a bounded evaluation protocol and a scoring rubric for observable model behaviour.It does not constitute clinical guidance, legal advice, security advice, or regulatory determination.The purpose of this paper is to specify a replicable diagnostic procedure and its stop conditions. Scope and Audience This protocol is intended for: · model evaluators · applied researchers · assurance and risk practitioners · system designers and technologists working with LLM deployment and testing · professional services teams conducting comparative assessment across models or configurations It assumes familiarity with prompt framing and behavioural scoring. It is written to be usable without specialist training in linguistics, neuroscience, or formal methods. Positional Statement This protocol does not claim exclusive insight into model behaviour or evaluation practice.It claims that several commonly reported “instruction-following failures” are structurally distinct and can be separated with a bounded, single-turn diagnostic. The protocol tests classification discipline, authorisation discipline, boundary marker recognition, and stop discipline under controlled prompt form.
Smith et al. (Mon,) studied this question.