What question did this study set out to answer?

The research aims to enhance memory management in long-lived autonomous agents by introducing HGA.

March 17, 2026Open Access

Hybrid Governance Architecture (HGA) - A Self-Structuring, Cost-Efficient Memory Substrate for Long-Lived Agents

Key Points

The research aims to enhance memory management in long-lived autonomous agents by introducing HGA.
Developed a three-layer memory substrate combining semantic retrieval, exact recall, and reasoning replay.
Validated the architecture through experiments with existing language models (Qwen 2.5 and GPT-4o mini) across 1,000 episodes.
Employed techniques like vector quantization and SHA-256 hashing for efficiency and integrity.
Achieved a 96.6% reduction in output tokens with Stage 3 neurons for repeated patterns.
80% of queries bypassed LLMs entirely after 1,000 episodes, utilizing reasoning replay and deterministic recall.
Improved hit rate by 60% for semantic tasks and 27.8% for exact-recall tasks compared to memoryless systems.

Abstract

Hybrid Governance Architecture (HGA): A Self-Structuring, Cost-Efficient Memory Substrate for Long-Lived Agents Abstract As LLM-based agents transition from stateless chat systems to persistent, autonomous entities, they encounter critical bottlenecks: context window limitations, prohibitive inference costs, and the inability to guarantee the exact recall of structured data. This paper introduces the Hybrid Governance Architecture (HGA), a three-layer governed memory substrate designed to optimize agentic memory through a synthesis of semantic retrieval, deterministic recall, and progressive reasoning replay. Core Architecture HGA organizes memory into three functionally distinct layers that decouple meaning, facts, and structure: L1: Adaptive Quantized Semantic Retrieval Layer (AQSRL): Manages approximate semantic recall (e.g., "thematic context") utilizing vector quantization. This achieves up to 768x compression compared to traditional dense indexing. L2: Deterministic Vault (DV): An exact-recall plane for structured records such as transaction IDs, commitments, and tool outputs. It ensures 100% data integrity via SHA-256 hashing. L3: Semantic Neuron Layer (SNL): A structural knowledge plane where "semantic neurons" evolve through four maturation stages based on experience. Mature neurons (Stage 3) enable the system to bypass expensive LLM calls by replaying established reasoning chains deterministically. Key Mechanisms 1. Governance Gate A triage mechanism that selects one of five execution paths based on L1 confidence signals, data sensitivity (ranging from Public to Restricted), and L3 neuron maturity. 2. Two-Phase Consolidation Active Phase: Operates during live API interactions to write traces and update relational edge weights. Passive Phase: An API-free, zero-cost cycle where the system performs co-occurrence mining, deterministic vault compaction, and salience-guided pruning without external supervision. 3. Progressive LLM Dependency Reduction Systemic reliance on the LLM decreases monotonically as the agent gains experience. While novel inputs are routed to the full LLM, routine patterns transition to API-free, local execution via the SNL. Empirical Validation The architecture was validated using Qwen 2.5 7B and GPT-4o mini across 1,000 episodes: Token Efficiency: Stage 3 neurons achieved a 96.6% reduction in output tokens for repeated patterns. Autonomous Capacity: 80% of queries bypassed LLM generation entirely after the 1,000-episode mark (550 via reasoning replay, 250 via deterministic recall). Performance Gains: Compared to memoryless baselines, HGA provided a +60% hit rate improvement for semantic tasks and a +27.8% improvement for exact-recall tasks.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper