Abstract We report, in a standardized single-model setting, a consistent, dose-dependent degradation of affective interpretation in large language models (LLMs) under semantic stress, which we term Algorithmic Affective Blunting (AAB; dose-dependent loss of affective interpretive coherence under semantic stress). We validate this phenomenon through a standardized protocol (N=200 runs, 600 rater-level ratings) using a single open-weight model () under fixed decoding settings. In this revision, we (i) introduce a simulated, length-matched decomposition of the Phase-3 stress structure into Noise-only and Persona-only subconditions, (ii) supplement the empirical Phase-3 findings with an exploratory simulated probe (Phase-4) to stress-test the alignment–brittleness hypothesis under matched Base/Instruct architectures, and (iii) introduce a computational proxy for the Affective Degradation Index (ADI) to enhance objectivity and scalability. We clarify that the "affective integrator" is a functional metaphor rather than a mechanistic claim, and that Phase-4 results are exploratory stress-tests rather than new empirical evidence. The study provides an empirical benchmark for interpretative degradation and emotional robustness in LLMs, with direct relevance for affect-rich AI deployments such as conversational and counseling systems.
Ryan SangBaek Kim (Thu,) studied this question.