What question did this study set out to answer?

The study aims to develop a formal theory linking strategic frame control and learning in principal-agent scenarios using a Q-learning perspective.

April 20, 2026Open Access

The Mechanism of Manipulation: A Theory of Dynamically-Stabilized Certainty Traps, Strategic Frame Control, and the Learning Wedge

Key Points

The study aims to develop a formal theory linking strategic frame control and learning in principal-agent scenarios using a Q-learning perspective.
Developed a formal theory based on principal-agent interactions and evaluative frames.
Utilized stochastic approximation for proof of key theorems regarding DSCT.
Explored the implications of varying frame mixtures on Receiver welfare and engagement dynamics.
Demonstrated that DSCT is achievable under specific conditions related to the Sender's strategy and Receiver rewards.
Found that certain stabilizing strategies optimally solve linear programs with varying frame supports.
Confirmed that partial restriction in frame feasibility can negatively impact risk-averse Receiver welfare.

Abstract

This paper develops a formal theory of strategic post-action evaluation in a repeated principal–agent interaction where a Sender selects an evaluative frame after observing the Receiver's action. Frames govern contemporaneous payoffs and capture discretionary evaluation — subjective performance review, platform scoring, algorithmic reward shaping, and preference labeling. The Receiver is modeled as an adaptive bandit Q-learner rather than a Bayesian expected-utility maximizer, connecting to the learning-in-games tradition and to self-confirming equilibrium. The central object is the dynamically-stabilized certainty trap (DSCT): a stable regime in which the Sender holds the Receiver's learned value of engagement at the outside-option indifference point while engagement persists with positive frequency. The core implementability theorem establishes that a DSCT at target value q† is achievable by a stationary Sender strategy if and only if q† lies in the closed convex hull of Receiver rewards reachable by feasible frame mixtures — proved via a Robbins–Siegmund stochastic approximation argument applied to the Q-learning recursion. The optimal stabilizing strategy solves a linear program; under a single mean-stabilization constraint an optimal stabilizer exists supported on at most two frames, with support size bounded sharply by the rank of the active constraint matrix. With frame-switching costs, a shrinking-band hysteresis policy achieves Q-value convergence while driving long-run switching frequency to zero, and strictly dominates any stabilizer with positive asymptotic switching rate when switching costs are positive. Five structural extensions are established. A regulatory non-monotonicity theorem proves that partial restriction of feasibility — removing interior frames while preserving extremes — strictly reduces risk-averse Receiver welfare by forcing higher-variance bang-bang mixtures, even when Sender extraction is weakly lower. A learning-wedge theorem proves that DSCT is robustly implementable against Q-learners under small payoff perturbations but is not robustly implementable against Bayesian Receivers with correct priors, where exact indifference is destroyed by arbitrarily small perturbations, establishing the mechanism as learning-theoretic rather than equilibrium-theoretic. An identifiability-failure theorem for linear Q-learning shows that when engagement and outside-option feature vectors are collinear the target and outside-option values cannot be controlled independently, with a sharp rank condition on the feature-gram matrix. A Markov-modulated feasibility theorem characterizes the implementable set under ergodic Markov-varying constraints as the Minkowski average of state-contingent reachable sets weighted by the stationary distribution. A strategic RLHF theorem formalizes the DSCT mechanism for pairwise preference learning under the Bradley–Terry model, establishing that the set of reachable fixed points generically forms a manifold of dimension min (d, dim (Bₗabel) ), so the learned reward model is non-identified from preference data alone when the labeler is strategic. This version explicitly situates the general post-action frame-control theory relative to the author's prior work on predictive and reinforcement learning models of the double bind, from which the present paper abstracts to provide a domain-general implementability and optimal-stabilization theory.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Kevin Fathi

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

The Mechanism of Manipulation: A Theory of Dynamically-Stabilized Certainty Traps, Strategic Frame Control, and the Learning Wedge

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study