Large language models generate confident-sounding responses regardless of whether they possess relevant knowledge — a failure mode known as hallucination. Current approaches to confidence estimation (output logits, calibration scaling, verbalized uncertainty) all derive confidence from the same generative process that produces answers, creating a fundamental confound: the confidence estimate cannot be independent of the answer. We present an experiment demonstrating that epistemic self-assessment — the ability to accurately judge one's own knowledge — can be achieved through structural analysis of a knowledge graph, providing a confidence signal that is independent of the answer generation process. A developmental knowledge graph agent computes a multi-dimensional confidence score for every response based on the structural properties of relevant subgraphs. When confidence falls below a calibrated threshold, the agent refuses to answer with "I don't know enough about this yet" — a hard boundary that cannot be overridden by prompting. Over 500 cycles of continuous learning, the system demonstrated well-calibrated self-assessment: at high confidence, the agent achieved 82.9% accuracy; at the lowest confidence, accuracy was 3.6% — confirming that the agent's confidence signal reliably predicts actual performance. The refusal mechanism improved answer reliability by 15.5 percentage points (from 48.1% to 63.6%) by refusing 31.3% of questions, of which 87.7% would have been errors. Beyond per-query confidence, the agent maintains a persistent self-model tracking domain-level competence, growth trajectories, and 10 types of metacognitive insights stored as nodes in the neural graph — creating recursive self-knowledge that influences subsequent cognition. All computation is performed on the graph structure with zero language model involvement.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sai Tilak Pally
Acumen (United States)
Acumen (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Sai Tilak Pally (Mon,) studied this question.
synapsesocial.com/papers/69a7cd1dd48f933b5eed921f — DOI: https://doi.org/10.5281/zenodo.18834620
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: