We introduce a multi-agent reinforcement learning architecture in which agents maintain a persistent latent identity state updated through bilateral perspective replay and consolidated through offline sleep processing. We test the system across three cooperative and competitive grid worlds with structurally different interaction structures and confirm five findings: (1) the identity system develops environmentally appropriate trait orderings without supervision across all three environments; (2) bilateral replay and sleep interact super-additively across all three environments; (3) a term-level decomposition of the bilateral replay signal isolates two distinct mechanisms — the value difference term drives reward improvement through optimization stabilization (−10% when removed), and the relational alignment term drives environment-specific trait differentiation with ordering collapse when removed (−7–8%), while the KL divergence term is negligible (−1%) — confirmed across 4 seeds × 3 environments; (4) coherence governance in gate mode produces identical learning outcomes to monitor mode with zero persistent rejections; and (5) preliminary evidence (seed=42) shows governance provides 2.4–2.6× character stability under adversarial transfer. Extended phases for recursive self-modification, environment co-evolution, cross-generational latent accumulation, and open-ended discovery are implemented and described as preliminary.
Building similarity graph...
Analyzing shared references across papers
Loading...
Isaah Bullens
Building similarity graph...
Analyzing shared references across papers
Loading...
Isaah Bullens (Thu,) studied this question.
www.synapsesocial.com/papers/69bf8978f665edcd009e926b — DOI: https://doi.org/10.5281/zenodo.19121756