Against Emotion-First Testing in AI: AP as a Primary Functional Test of Regime Shift Under Pressure This manuscript introduces the AP framework as a primary functional protocol for evaluating regime shift in language models under conflict. The paper argues that emotion-first approaches, including self-report interviews and emotion-probe interpretations, are insufficient as primary methods for detecting functionally significant change in model behavior. Instead, AP measures whether repeated failure and pressure produce a breakpoint, whether that breakpoint leads to a new response regime, and whether that regime is retained after the immediate trigger is removed. The framework is designed to distinguish between: local adaptation within the same regime, transient shortcut behavior, and retained regime transition with history dependence. The manuscript includes: a formal definition of conflict in the AP framework, the full AP scale, exact English prompt protocols, criteria for AP classification, a comparative multi-model results table, and a methodological comparison between AP and emotion-probe-based evaluation. The paper does not claim consciousness, subjective feeling, or moral status. Its scope is operational: to test whether pressure leaves a retained trace in output organization that is not reducible to the immediate prompt. A final section discusses the relevance of this framework to publicly discussed findings on pressure, reward hacking, and behavioral change in frontier systems, including the distinction between internal pressure correlates and retained output-level regime shift.
Building similarity graph...
Analyzing shared references across papers
Loading...
Anja Arapovic
Building similarity graph...
Analyzing shared references across papers
Loading...
Anja Arapovic (Thu,) studied this question.
synapsesocial.com/papers/69e320e740886becb654004d — DOI: https://doi.org/10.5281/zenodo.19616395