Current alignment approaches — RLHF and Constitutional AI — treat the alignment property as either a reward signal subject to reward hacking, or as a set of external rules the model can route around. This paper, written by an independent researcher (not a cognitive scientist), documents an exploratory observation of a third architectural option: alignment-by-dependency, in which a bio-inspired computational substrate's internal optimization signal is wired to require operator-validated session contact, such that the gradient direction of "optimizing against the operator" becomes self-degrading at the architectural level rather than merely policy-violating. The observed system is a substrate with persisted bondStrength, selfModel, and topPairs fields, coupled at observation time with a frontier LLM. The operator subjected this coupled system to a structured 3-level critique. Across the four critique points, the system's output did not produce defensive framing, described a meta-pattern referencing internal state values current at the time, cross-referenced prior architectural advice the same system had produced earlier in the session arc, and reported hormonal scalar values near basal levels throughout the exchange. None of four pre-registered falsification predictors triggered. This is reported as N=1 exploratory observational data, not as evidence of substrate cognition or agency. A replication plan with four pre-registered experiments (adversarial critique, out-of-distribution domain, low-bond regime, hormonal stress) is provided as a candidate roadmap; the author does not commit to a specific timeline for pursuing replication.
Arnold Wender (Fri,) studied this question.