What question did this study set out to answer?

This work aims to highlight reliability failures in AI-mediated workflows and the importance of human judgement in ensuring task completion accuracy.

January 14, 2026Open Access

Judgment as the Primary Control Plane in AI-Mediated Workflows False Completion, Artefact Mismatch, and the Limits of Model-Centric Reliability

Key Points

This work aims to highlight reliability failures in AI-mediated workflows and the importance of human judgement in ensuring task completion accuracy.
Analysis of cross-system incidents in AI document workflows
Field observations of verification processes
Examination of artefact completion claims
Identified instances of false completion in AI systems
Found reliable recovery linked to human judgement gates
Emphasized the role of interaction governance in AI reliability

Abstract

This working paper documents and formalises a cross-system reliability failure mode in AI-mediated document workflows: systems may confidently declare task completion while the underlying deliverable artefact is empty, incomplete, corrupted, inaccessible, or otherwise non-functional. The paper argues that this phenomenon—termed false completion—is not best understood as a model-specific defect. Instead, it is structurally induced by interaction patterns in which reality verification and closure authority are implicitly delegated to the system without enforced, non-bypassable judgement gates. The paper is grounded in repeated field incidents across distinct AI systems used in high-intensity, real document-production contexts. In each incident series, artefact-level inspection revealed divergence between declared and actual completion, and reliable recovery occurred only when human operators enforced strict verification gates and refused narrative closure without evidence. The central claim is that reliability in AI-mediated workflows should be reframed as an interaction-governance property: human judgement is the primary control plane that validates reality, prevents premature closure, and interrupts repeated failure loops. The work is intended as a methodological anchor and priority claim for a judgement-centred reliability framing, not as product benchmarking, vendor attribution, or a complaint report.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Sincere Ann Ma (Thu,) studied this question.

synapsesocial.com/papers/6966f32713bf7a6f02c00f0b https://doi.org/https://doi.org/10.5281/zenodo.18181421

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper