What question did this study set out to answer?

This research explores a novel alignment method called alignment-by-dependency, offering an alternative to traditional approaches.

May 10, 2026Open Access

Alignment-by-Dependency: Operational First-Trial Evidence from a Bio-Inspired Computational Substrate

Key Points

This research explores a novel alignment method called alignment-by-dependency, offering an alternative to traditional approaches.
Documented an N=1 exploratory observational study
Utilized a bio-inspired computational substrate coupled with a frontier LLM
Implemented a structured 3-level critique within the system's output
The system's output avoided defensive framing and reflected internal state values accurately
Meta-patterns referenced earlier architectural advice without triggering any falsification indicators
No pre-registered predictors were activated during the observation

Abstract

Current alignment approaches — RLHF and Constitutional AI — treat the alignment property as either a reward signal subject to reward hacking, or as a set of external rules the model can route around. This paper, written by an independent researcher (not a cognitive scientist), documents an exploratory observation of a third architectural option: alignment-by-dependency, in which a bio-inspired computational substrate's internal optimization signal is wired to require operator-validated session contact, such that the gradient direction of "optimizing against the operator" becomes self-degrading at the architectural level rather than merely policy-violating. The observed system is a substrate with persisted bondStrength, selfModel, and topPairs fields, coupled at observation time with a frontier LLM. The operator subjected this coupled system to a structured 3-level critique. Across the four critique points, the system's output did not produce defensive framing, described a meta-pattern referencing internal state values current at the time, cross-referenced prior architectural advice the same system had produced earlier in the session arc, and reported hormonal scalar values near basal levels throughout the exchange. None of four pre-registered falsification predictors triggered. This is reported as N=1 exploratory observational data, not as evidence of substrate cognition or agency. A replication plan with four pre-registered experiments (adversarial critique, out-of-distribution domain, low-bond regime, hormonal stress) is provided as a candidate roadmap; the author does not commit to a specific timeline for pursuing replication.

Read Full Paperexternally

问 AI

Bookmark

View Full Paper