Recent advances in large language models (LLMs) have enabled autonomous agentic systems capable of multi-step planning, tool use, and iterative reasoning. Despite their promise, many existing agent-based implementations remain ad hoc, tightly coupled, and insufficiently governed, limiting their suitability for enterprise and safety-critical environments. In particular, the lack of deterministic control, systematic evaluation, and auditability poses significant challenges for real-world deployment. This paper introduces SAGE-AI (Structured Agentic Governance for Enterprise AI), a composable architecture and evaluation framework for agentic AI systems. SAGE-AI decomposes autonomous behavior into explicit planner, executor, critic, and synthesizer roles, coordinated through a governance-aware orchestration layer. This design enables policy enforcement, controlled tool usage, and end-to-end traceability without modifying the underlying language models. We present a controlled experimental evaluation comparing SAGE-AI against a monolithic agent baseline across planning, tool-use, and verification tasks. The evaluation emphasizes architectural behavior rather than raw model performance, measuring task success, invalid actions, trace completeness, and recovery behavior under identical conditions. Results show that SAGE-AI reduces unsafe actions, improves traceability, and improves task reliability relative to monolithic agents, with an expected latency overhead from governance and multi-role coordination. By combining architectural decomposition with a statistically rigorous, auditable evaluation methodology, this work bridges the gap between experimental agentic systems and production-ready autonomous AI.
John Benito Jesudasan Peter (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: