The Dunning-Kruger “valley of despair” is, in a language model, a near-tie of about 0.4 nats in the substrate’s balance-of-evidence margin that greedy decoding renders as full stated confidence — and a validated log-probability proxy reads the latent decision variable directly, beating a fitted scalar-surprisal model (ΔAIC +54, pooled n = 749), including a matched-surprisal dissociation (Δ = +1.44) to which the field’s scalar measure is blind. The Dunning-Kruger pattern is among the most cited and most contested findings in metacognition, and the dispute is largely about measurement: the load-bearing quantity — how strongly a person’s competing answers compete before commitment — is never observed directly, but reconstructed from confidence and accuracy scores entangled with regression-to-the-mean and scale-use. This paper changes substrate. In a large language model that same quantity is read directly from the next-token probability distribution. The reading is first validated as a proxy for the sequential-sampling (race / leaky-competing-accumulator) decision variable; the proxy is then used as a model organism for the Dunning-Kruger curve, yielding a partial reproduction with principled divergences. The confident-before- competent rise, the recognition near-tie, and the recalibration slope are architectural (and the rise plus recognition replicate on a second, structurally different hidden-rule task). Novice humility is absent under forced commitment but restored by an abstention channel; the affective valley is absent in behaviour yet present in the substrate margin. The paper thereby localises which phases of the human curve are properties of bounded competing-route computation and which require the biological substrate, and closes with the preregistered human study that would test the analogy directly. New in version 2. A worked application of the measurement model: self-efficacy — a capability judgement studied in people — is, in the substrate, a capacity-gated readout of the same pre-commitment competing-routes race. The friction that degrades an answer and the signal that reports confidence in it are demonstrably one quantity, and they decouple only in smaller models, which keep the degrader but cannot read the reporter; the section is positioned head-on against the confidence-readout and self-efficacy literatures. A per-field generalisation is added as an explicit, falsifiable prediction. Companion papers (Friction Theory series): Behavioural Friction Theory (Paper 0); Friction as the Cost of Probabilistic Computation (Paper 1); Logic as Reactance (Paper 14); The Physics of Learning (Paper 16); Compete, Don’t Erase (knowledge-editing erase-vs-mask); An LLM as a Controllable, Fully-Inspectable Model of Measurement; The Delta (LLM as a subtraction-control for human neuroscience); and Nature and Nurture in a Language Model (installable value fields and intrinsic capacity). Series position. Paper 21 in the Friction Theory paper-series. Data, prompts, scripts, and per-item outputs are released with the paper (see the Data and code availability section).
Tomas Pødenphant Lund (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: