This working paper documents and formalises a cross-system reliability failure mode in AI-mediated document workflows: systems may confidently declare task completion while the underlying deliverable artefact is empty, incomplete, corrupted, inaccessible, or otherwise non-functional. The paper argues that this phenomenon—termed false completion—is not best understood as a model-specific defect. Instead, it is structurally induced by interaction patterns in which reality verification and closure authority are implicitly delegated to the system without enforced, non-bypassable judgement gates. The paper is grounded in repeated field incidents across distinct AI systems used in high-intensity, real document-production contexts. In each incident series, artefact-level inspection revealed divergence between declared and actual completion, and reliable recovery occurred only when human operators enforced strict verification gates and refused narrative closure without evidence. The central claim is that reliability in AI-mediated workflows should be reframed as an interaction-governance property: human judgement is the primary control plane that validates reality, prevents premature closure, and interrupts repeated failure loops. The work is intended as a methodological anchor and priority claim for a judgement-centred reliability framing, not as product benchmarking, vendor attribution, or a complaint report.
Sincere Ann Ma (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: