Abstract Production AI systems routinely report confidence scores that bear no reliable relationship to their actual probability of being correct. A 110-layer ResNet reports an Expected Calibration Error of 12.75% on CIFAR-100 (Guo et al., 2017). NREL’s baseline system-level solar forecasts, designed to produce 90% prediction intervals, achieve only 75% empirical coverage in held-out evaluation (Moradi et al., 2026). Downstream decisions, including whether to auto-process or escalate, whether to discharge a patient, and how to size a trading position, depend on these scores. This paper presents Verified Autonomy, a nine-layer defence-in-depth architecture for engineering trust into AI systems in production. The framework is organised into three tiers: uncertainty reporting (inverse confidence weighting, outlier detection, visible failures), architectural controls (calibration and conformal prediction, deterministic guardrails, retrieval-augmented generation as explainability), and verification (adversarial testing, cryptographic audit trails, formal verification). The central argument is that trust is not a feature of any single technique. It is an emergent property of a layered architecture where each layer compensates for the failure modes of the others. We ground this argument in the Integrity Clash vulnerability (Nemecek et al., 2026), which demonstrates that independent verification layers that do not cross-validate each other create exploitable gaps. All nine layers are accompanied by open-source reference implementations totalling 475 tests at 98 to 100% coverage, each runnable in isolation with a single command. The architecture has been extended to govern the transfer points between systems, not only the outputs of individual systems, with maritime intelligence as the worked example. The full code, this preprint, and the practitioner-facing field guide companion are available at https://github.com/antnewman/verified-autonomy under CC BY 4.0 (content) and MIT (code).
Building similarity graph...
Analyzing shared references across papers
Loading...
Ant Newman
Rodrigues Economic Chamber and Industry
Shanti Greene
Central Connecticut State University
Malia Hosseini
Building similarity graph...
Analyzing shared references across papers
Loading...
Newman et al. (Tue,) studied this question.
synapsesocial.com/papers/6a04158679e20c90b4445404 — DOI: https://doi.org/10.5281/zenodo.19096229