Stateful AI systems — persistent research assistants, long-running agents, multi-sessioncopilots — accumulate knowledge that must survive across sessions. Flat vector retrieval isstructurally unfit for this. Embedding splits relational scenes across chunk boundaries, so aquery about an event may recover its location but lose the participants. Separately, storesthat consolidate and deduplicate over time quietly forget any fact that has not been queriedrecently. Both failures grow worse with use, and neither responds to threshold tuning.Hippo-Cortex pairs a lossy semantic cache (Mem0 + Qdrant) with a permanentrelational graph (Kuzu), following the hippocampal–neocortical division in memory systems:fast and lossy on top, slow and permanent below. The graph is the source of truth; thecache is rebuilt from it and feeds back into it via reconsolidation, so it warms with use ratherthan decaying. A three-tier retrieval chain escalates from sub-250 ms cache lookup, throughdecay-weighted BFS graph traversal, to web ingestion behind a human-in-the-loop gate.Rather than decomposing text into atomic triples, the system stores complete Events —one node per scene, holding participants, locations, causal links, and outcome together —alongside static Properties on entity nodes. The same sentence that produces six discon-nected triples is stored once, intact. A five-pass entity linker with a Stub → Emerging → Es-tablished node lifecycle prevents the graph from fragmenting as documents accumulate.On a 40-question multi-hop benchmark (8 question types, news corpus), hit rate reaches57.5% and event-participant accuracy reaches 83%, up from 33%. Profiling on local 768-dimensional embeddings (nomic-embed-text) shows the 0.85 confidence floor from theOpenAI embedding literature must drop to 0.55–0.70 for local models to reach comparablehit rates.
Syed Mohammad Meezaan Chishty (Sun,) studied this question.