What question did this study set out to answer?

This research aims to investigate the effectiveness of the Reverie architecture for AI memory and its impact on performance in LongMemEval.

March 12, 2026Open Access

Reverie: A Two-Layer Architecture for Persistent AI Memory and the Saturation of LongMemEval

Key Points

This research aims to investigate the effectiveness of the Reverie architecture for AI memory and its impact on performance in LongMemEval.
Developed a two-layer memory architecture with specific storage and retrieval mechanisms.
Conducted controlled experiments comparing models on LongMemEval and an Oracle setup.
Implemented an iterative build-test-prune process to refine the system components.
Reverie achieved a score of 94.6% on LongMemEval, closely trailing the top system.
An Oracle experiment produced a 93.4% score, indicating model dominance in results.
Architecture enhancements contributed an additional 1.2 points in specific evaluation categories.

Abstract

Reverie achieves 94.6% on LongMemEval (n=500, GPT-4o judge), within 0.27 points of the top-performing system. A controlled Oracle experiment, running the same synthesis model (Claude Sonnet 4.6) with perfect retrieval, scores 93.4%, revealing that LongMemEval is model-dominated: the architecture contributes +1.2 points, concentrated in knowledge-update and multi-session categories where architectural features (supersession tracking, session summaries) directly apply. This pattern is not unique to Reverie; we estimate comparable architectural deltas across leaderboard systems. The system is a two-layer memory architecture: L1 stores raw conversational experiences losslessly, and L2 extracts declarative facts with LLM-confirmed supersession detection for knowledge updates. Both layers are searched with hybrid vector+keyword retrieval and synthesized by an LLM. The paper's primary contribution is methodological: an iterative build-test-prune development process in which every component was subjected to ablation, and several (including four additional layers, weight decay, contextual embeddings, and LLM-declared edges) were removed when they degraded performance or failed to justify their complexity.

Reverie: A Two-Layer Architecture for Persistent AI Memory and the Saturation of LongMemEval

Key Points

Abstract

Cite This Study