Achilles was invincible in battle — except for one point of structural vulnerability that no amount of strength could compensate for. Modern LLM-based agent frameworks (LangChain, AutoGen, CrewAI, ReAct) share this property exactly: impressive capability in controlled settings, catastrophically exploitable in regulated production environments through a single architectural flaw — the language model controls the decision. Organizations in regulated sectors (finance, insurance, healthcare, legal, compliance) face a direct consequence: these frameworks cannot be deployed in workflows subject to EU AI Act, DORA, or GDPR Article 22, because they provide no structural guarantee of determinism, auditability, or equal treatment. Traditional symbolic agent systems (JADE, Jason, Jadex) satisfy regulatory requirements but cannot ingest the unstructured natural-language inputs that define real enterprise workflows. The industry needs both properties simultaneously. No existing framework provides them. AQUILES is a production architecture for AI agents in regulated domains that resolves this gap through principled separation of concerns, instantiating the HADD paradigm (Hybrid Agents with Deterministic Decisions). AQUILES organizes agent functionality into five cooperating layers: an Interface Layer converting unstructured input into typed, validated beliefs via LLM sensors; a Cognition Layer performing pure-function BDI deliberation fully determined by its inputs; a Planning Layer selecting from a pre-verified HTN plan library without runtime synthesis; an Execution Layer enforcing typed precondition and postcondition contracts on every capability invocation; and a transverse Observation Layer producing append-only audit entries synchronously with every state transition. Language models are confined strictly to the perception boundary — they parse input into beliefs, they never select goals, plans, or capabilities. The heel remains; it is simply no longer load-bearing. The HADD paradigm is codified as six architectural invariants: (I1) Typed Role Inversion — LLMs as sensors only, never as control-flow components; (I2) Deterministic Cognition — the reasoning layer is a pure function of beliefs, goals, and rules; (I3) Bounded Planning — execution draws exclusively from a pre-verified plan library; (I4) Validated Execution — every capability invocation passes typed pre/post-condition checks; (I5) Complete Observability — every decision is forensically reconstructable from the audit log; (I6) Epistemic Precondition — no belief enters the BDI cycle without satisfying freshness, non-contestation, and source triangulation, enforced by the EVR module (Epistemic Verification for RAG). Any implementation satisfying all six invariants acquires reproducibility, zero LLM hallucination in state, LLM provider independence, and structural alignment with EU AI Act Articles 12–15 — as architectural properties, not retrofitted compliance measures. AQUILES partitions agents into cognitive holons (BDI-HTN reasoning components subject to full HADD governance) and reactive holons (deterministic capability executors verified by typed contracts alone). In observed production deployments, 70–80% of holons by count are reactive, meaning governance complexity scales with the cognitive subset rather than with total component count. The AQUILES protocol is language-agnostic by design: cognitive holons are typically Python (Anthropic SDK, sentence-transformers, pypdf); endpoint-monitoring holons are Go (single-binary cross-compilation); blockchain and zero-knowledge holons are Rust (arkworks, halo2, revm). We prove a Language Neutrality property: HADD compliance is preserved across heterogeneous polyglot deployments. For autonomous field deployments, AQUILES derives MYRMIDON agents that execute a signed MissionPackage autonomously on constrained hardware, inheriting AQUILES's safety guarantees without requiring runtime connectivity. This paper makes five engineering contributions: (C1) the HADD paradigm formalized as six architectural invariants with rationale and derived operational properties; (C2) the cognitive/reactive holon distinction and its governance economy consequences; (C3) a polyglot holon model with Language Neutrality proof and domain-language affinity mapping across Python, Go, and Rust; (C4) a multi-tenant operational-cell formalism enabling cryptographically enforced tenant isolation for regulated multi-client deployments; (C5) four reusable design patterns extracted from production experience (Sensor Firewall, Belief Expiry, Capability Contract, Observation Fanout), together with measurement methodology, adoption guidance, and explicit characterization of the architecture's limits.
Errecalde et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: