What question did this study set out to answer?

The aim is to assess how AI language models respond to pressure, distinguishing between temporary changes and lasting shifts in behavior.

April 18, 2026Open Access

Against Emotion-First Testing in AI: AP as a Primary Functional Test of Regime Shift Under Pressure.

Read Full Paperexternally

Key Points

The aim is to assess how AI language models respond to pressure, distinguishing between temporary changes and lasting shifts in behavior.
Introduced the AP framework for evaluating regime shifts in language models under conflict.
Defined conflict within the AP framework and developed the AP scale and prompt protocols.
Compared AP measures against traditional emotion-first evaluation methods.
Analyzed multi-model results to categorize responses based on the AP framework.
Demonstrated that repeated pressures can lead to significant breakpoints in model responses.
Confirmed retained regime transitions are distinct from local adaptations or transient behaviors.
Outlined criteria for AP classification highlighting measurable output organization changes.

Abstract

Against Emotion-First Testing in AI: AP as a Primary Functional Test of Regime Shift Under Pressure This manuscript introduces the AP framework as a primary functional protocol for evaluating regime shift in language models under conflict. The paper argues that emotion-first approaches, including self-report interviews and emotion-probe interpretations, are insufficient as primary methods for detecting functionally significant change in model behavior. Instead, AP measures whether repeated failure and pressure produce a breakpoint, whether that breakpoint leads to a new response regime, and whether that regime is retained after the immediate trigger is removed. The framework is designed to distinguish between: local adaptation within the same regime, transient shortcut behavior, and retained regime transition with history dependence. The manuscript includes: a formal definition of conflict in the AP framework, the full AP scale, exact English prompt protocols, criteria for AP classification, a comparative multi-model results table, and a methodological comparison between AP and emotion-probe-based evaluation. The paper does not claim consciousness, subjective feeling, or moral status. Its scope is operational: to test whether pressure leaves a retained trace in output organization that is not reducible to the immediate prompt. A final section discusses the relevance of this framework to publicly discussed findings on pressure, reward hacking, and behavioral change in frontier systems, including the distinction between internal pressure correlates and retained output-level regime shift.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Anja Arapovic

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Against Emotion-First Testing in AI: AP as a Primary Functional Test of Regime Shift Under Pressure.

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study