This paper develops a formal theory of strategic post-action evaluation in a repeated principal–agent interaction where a Sender selects an evaluative frame after observing the Receiver's action. Frames govern contemporaneous payoffs and capture discretionary evaluation — subjective performance review, platform scoring, algorithmic reward shaping, and preference labeling. The Receiver is modeled as an adaptive bandit Q-learner rather than a Bayesian expected-utility maximizer, connecting to the learning-in-games tradition and to self-confirming equilibrium. The central object is the dynamically-stabilized certainty trap (DSCT): a stable regime in which the Sender holds the Receiver's learned value of engagement at the outside-option indifference point while engagement persists with positive frequency. The core implementability theorem establishes that a DSCT at target value q† is achievable by a stationary Sender strategy if and only if q† lies in the closed convex hull of Receiver rewards reachable by feasible frame mixtures — proved via a Robbins–Siegmund stochastic approximation argument applied to the Q-learning recursion. The optimal stabilizing strategy solves a linear program; under a single mean-stabilization constraint an optimal stabilizer exists supported on at most two frames, with support size bounded sharply by the rank of the active constraint matrix. With frame-switching costs, a shrinking-band hysteresis policy achieves Q-value convergence while driving long-run switching frequency to zero, and strictly dominates any stabilizer with positive asymptotic switching rate when switching costs are positive. Five structural extensions are established. A regulatory non-monotonicity theorem proves that partial restriction of feasibility — removing interior frames while preserving extremes — strictly reduces risk-averse Receiver welfare by forcing higher-variance bang-bang mixtures, even when Sender extraction is weakly lower. A learning-wedge theorem proves that DSCT is robustly implementable against Q-learners under small payoff perturbations but is not robustly implementable against Bayesian Receivers with correct priors, where exact indifference is destroyed by arbitrarily small perturbations, establishing the mechanism as learning-theoretic rather than equilibrium-theoretic. An identifiability-failure theorem for linear Q-learning shows that when engagement and outside-option feature vectors are collinear the target and outside-option values cannot be controlled independently, with a sharp rank condition on the feature-gram matrix. A Markov-modulated feasibility theorem characterizes the implementable set under ergodic Markov-varying constraints as the Minkowski average of state-contingent reachable sets weighted by the stationary distribution. A strategic RLHF theorem formalizes the DSCT mechanism for pairwise preference learning under the Bradley–Terry model, establishing that the set of reachable fixed points generically forms a manifold of dimension min (d, dim (Bₗabel) ), so the learned reward model is non-identified from preference data alone when the labeler is strategic. This version explicitly situates the general post-action frame-control theory relative to the author's prior work on predictive and reinforcement learning models of the double bind, from which the present paper abstracts to provide a domain-general implementability and optimal-stabilization theory.
Building similarity graph...
Analyzing shared references across papers
Loading...
Kevin Fathi
Building similarity graph...
Analyzing shared references across papers
Loading...
Kevin Fathi (Wed,) studied this question.
www.synapsesocial.com/papers/69e5c30b03c2939914028e42 — DOI: https://doi.org/10.5281/zenodo.19646294