What question did this study set out to answer?

The research aims to characterize the mechanisms governing reliability in AI systems across distinct reasoning domains.

March 19, 2026Open Access

A Unified Framework for AI Reliability Across Formally Distinct Human Reasoning Domains

Key Points

The research aims to characterize the mechanisms governing reliability in AI systems across distinct reasoning domains.
Conducted three empirical studies involving 7,950 data points and 5,004 evaluation sessions.
Analyzed performance across six distinct AI models in three reasoning domains.
Measured error correlation between ensemble components to assess reliability.
Achieved a performance error correlation of ρ̂ = 0.80; improved to ρ̂ = 0.19 with architectural role-separation.
Demonstrated a 77% reduction in error correlation while maintaining identical compute resources.
Identified a calibration inversion where the most accurate model had the second-worst confidence-interval calibration.

Abstract

Artificial intelligence systems now mediate an estimated 2.8 billion daily human interactions, making reliability a matter of societal infrastructure. Yet the mechanism governing reliability above the current frontier—approximately 93% single-agent accuracy on complex reasoning tasks—remains uncharacterised. Here we report a three-study empirical programme spanning 7,950 individually scored data points, 5,004 evaluation sessions, and six frontier AI models across three fundamentally distinct problem domains. The governing variable is error correlation ρ̂ between ensemble components, measurable from observed performance. Compute scaling yields ρ̂ = 0.80; architectural role-separation (Generator–Auditor–Adversary–Synthesizer; GAAS) reduces this to ρ̂ = 0.19—a four-fold improvement (77% reduction in error correlation) on identical compute. A calibration inversion in indeterminate domains—the most accurate model simultaneously achieves second-worst confidence-interval calibration—demonstrates that intelligence and reliability are empirically orthogonal. The GAAS framework and a ρ̂ estimator constitute a deployable architectural specification for high-reliability AI.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper