What question did this study set out to answer?

This research aims to improve memory retention across sessions in stateful AI systems, addressing limitations in traditional retrieval methods.

June 9, 2026Open Access

Hippo-Cortex: A Dual-Layer Persistent Memory Engine for Stateful AI Applications

Key Points

This research aims to improve memory retention across sessions in stateful AI systems, addressing limitations in traditional retrieval methods.
Developed Hippo-Cortex combining a lossy semantic cache with a permanent relational graph.
Implemented a three-tier retrieval chain for efficient data access and retraining.
Evaluated performance on a 40-question multi-hop benchmark with various question types.
Achieved a hit rate of 57.5% with an event-participant accuracy of 83%, up from 33%.
Showed that local embeddings need confidence levels of 0.55–0.70 to match hit rates found in the OpenAI literature.

Abstract

Stateful AI systems — persistent research assistants, long-running agents, multi-sessioncopilots — accumulate knowledge that must survive across sessions. Flat vector retrieval isstructurally unfit for this. Embedding splits relational scenes across chunk boundaries, so aquery about an event may recover its location but lose the participants. Separately, storesthat consolidate and deduplicate over time quietly forget any fact that has not been queriedrecently. Both failures grow worse with use, and neither responds to threshold tuning.Hippo-Cortex pairs a lossy semantic cache (Mem0 + Qdrant) with a permanentrelational graph (Kuzu), following the hippocampal–neocortical division in memory systems:fast and lossy on top, slow and permanent below. The graph is the source of truth; thecache is rebuilt from it and feeds back into it via reconsolidation, so it warms with use ratherthan decaying. A three-tier retrieval chain escalates from sub-250 ms cache lookup, throughdecay-weighted BFS graph traversal, to web ingestion behind a human-in-the-loop gate.Rather than decomposing text into atomic triples, the system stores complete Events —one node per scene, holding participants, locations, causal links, and outcome together —alongside static Properties on entity nodes. The same sentence that produces six discon-nected triples is stored once, intact. A five-pass entity linker with a Stub → Emerging → Es-tablished node lifecycle prevents the graph from fragmenting as documents accumulate.On a 40-question multi-hop benchmark (8 question types, news corpus), hit rate reaches57.5% and event-participant accuracy reaches 83%, up from 33%. Profiling on local 768-dimensional embeddings (nomic-embed-text) shows the 0.85 confidence floor from theOpenAI embedding literature must drop to 0.55–0.70 for local models to reach comparablehit rates.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper