Can a single token-level signal read from a language model’s log-probabilities — the count of competing routes, the commit-margin in nats, the first-token entropy — act as a readable correlate of how a model resolves a social-moral judgment? This paper reads such a signal across three experiments and delimits, carefully, what it does and does not establish: it is a substrate-level correlate of present route-competition, not an established causal mechanism and not a claim that the model feels anything. Three experiments, with controls. (E1) Third-party moral commitment is morality-specific and lexeme-robust, with a friction difference that appears in the ambiguous region where the judgment is not pre-resolved. (E2) Content-invariant binary rule-evaluation generalises up an abstraction gradient at capability — a capability control that bounds the signal, not a social-moral signature. (E3) A disadvantaged agent’s disengagement is driven by comparison and expectation rather than reward amount, with a human-like advantaged/disadvantaged asymmetry; and — read internally rather than at the output token — the inequity reaction is strongly predicted by a non-morally-defined internal disparity axis (a gap-controlled, context-general dose-response, made non-circular by construction), consistent with a fairness reaction that imports another agent’s unresolved competition rather than reciting a learned moral script. Every contrast carries a bootstrap confidence interval; a lexeme-invariance check distinguishes robust effects from answer-token artefacts; and the limits (two models, pilot scale, a correlational interpretability readout for the internal result) are stated rather than hidden. Companion papers in the series develop the underlying friction theory, the forward-modelling and operational accounts, the competing-routes measurement-model programme, and the mechanism home for the mirror-friction reading of fairness. Prepared for submission to Transactions on Machine Learning Research (TMLR). Data and code. The stimulus generators, probes, re-analysis scripts, and per-token log-probability outputs are available from the author.
Tomas Pødenphant Lund (Sat,) studied this question.