What question did this study set out to answer?

To create an architecture that maintains coherence in long interactions with Large Language Models using a structured memory system.

March 28, 2026Open Access

Pointer-Grounded Topic Memory: Hierarchical Adaptive Context Management for Coherent Long-Form LLM Interactions

Key Points

To create an architecture that maintains coherence in long interactions with Large Language Models using a structured memory system.
Introduced Pointer-Grounded Topic Memory (PGTM) to manage conversation coherence.
Developed a dynamic hierarchical tree of topic nodes for structured memory organization.
Implemented a two-pass retrieval cycle to ensure selective context injection.
Achieved 92.7% recall in controlled experiments compared to 25.0% for flat summarization.
Maintained 92% recall across live experiments with 70 challenges across 2 runs.
Outperformed Simple Retrieval and Full History methods in large-scale experiments with realistic responses.

Abstract

We introduce Pointer-Grounded Topic Memory (PGTM), an architecture for maintaining coherence in extended interactions with Large Language Models. PGTM organizes conversational memory as a dynamic hierarchical tree of topic nodes, where each node contains a continuously updated narrative summary augmented with explicit pointers to the original messages that ground it. Unlike flat summarization, which loses access to source material, and unlike retrieval-augmented approaches, which lack narrative structure, PGTM maintains both a compressed thematic map of the conversation and direct pathways back to full-resolution exchanges. The architecture introduces three mechanisms: (1) a Topic Memory Tree with support for splitting, merging, and dormancy transitions; (2) a Pointer Lifecycle Protocol managing typed references that track topic evolution and supersession; and (3) a Two-Pass Retrieval Cycle ensuring context injection is inherently selective. Validated through three experiments of increasing scale: a controlled experiment (48 challenges, 92.7% recall vs 25.0% for flat summarization, McNemar's p < 0.0001), a live LLM-driven experiment (70 challenges, 92% recall across 2 runs), and a 200-turn large-scale experiment with realistic-length responses (88% recall, outperforming Simple Retrieval at 85% and Full History at 83%, at 10% of Full History's token cost). A controlled comparison showed that performance is driven by algorithm design (locked-root crystallization, full-message pointer expansion), not model capability, confirming deployability with cost-efficient LLMs.

Pointer-Grounded Topic Memory: Hierarchical Adaptive Context Management for Coherent Long-Form LLM Interactions

Key Points

Abstract

Cite This Study