Long-lived autonomous agents accumulate experience across many sessions, surfaces, and underlying model revisions. Flat vector-retrieval systems treat this accumulation as undifferentiated text, surfacing memories by cosine similarity alone and conflating facts with interpretations, single utterances with settled decisions. We argue that durable agent continuity requires a structured cognitive memory architecture composed of an identity-backbone with required-at-boot anchor entries, a substrate of decomposed episodic units with provenance and confidence, a three-layer recall pipeline that returns facts, agent-internal thoughts, and recognized tensions via Hebbian-overlap traversal, and a falsifiable agent-quality layer with calibration-graded predictions and bitemporal worldmodel revisions. This paper describes the architecture, the engineering disciplines that emerged from deployment incidents, and the empirical evidence from internal evaluation instances. Three contributions are central: (i) an identity-backbone schema and boot-truth protocol that enforce a structural “no-backbone, no-agent” gate at every surface, addressing identity continuity across model and surface changes; (ii) a three-layer Hebbian recall module that composes fact retrieval with agent-internal-thought and tension retrieval via a GIN-indexed source-memory-identifier array, preserving existing fact-only consumers via additive composition; (iii) a falsifiable agent-quality layer with structurally enforced anti-self-judging: predictions resolve only through externalized evidence sources, calibration is reported through a Brier-Murphy decomposition with a generalized-uncertainty term that admits partial outcomes, and worldmodel revisions are tracked as bitemporal chains. We position the architecture against prior work on memory-augmented agents, vector-retrieval baselines, hierarchical memory management, bitemporal knowledge graphs, constitutional discipline, and anti-sycophancy calibration, and discuss the operational question, central to any production deployment, of how an operator may know, in measurable terms, that an agent’s claims are calibrated. SSRN Working Paper — JEL Classification: C88, L86, O33. Companion to zenodo.19673132 (LongMemEval-M empirical evaluation, Sritharan 2026).
Building similarity graph...
Analyzing shared references across papers
Loading...
Theshoth Sritharan (Mon,) studied this question.
synapsesocial.com/papers/6a168b280c924ddd1bd5a19f — DOI: https://doi.org/10.5281/zenodo.20371134
Theshoth Sritharan
Valve (United States)
Goldman Sachs (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...