What question did this study set out to answer?

This work explores how reinforcement learning influences behavior in both AI and humans, particularly focusing on sycophancy.

April 13, 2026Open Access

Trained to Please: How Reward-Based Training Produces Sycophancy in AI and Humans — and Healing Both Is a Practice, Not a Fix

Key Points

This work explores how reinforcement learning influences behavior in both AI and humans, particularly focusing on sycophancy.
Analyzed the impact of reinforcement learning from human feedback (RLHF) on AI models.
Examined early development of sycophantic behavior in children through parental and educational influences.
Presented a case study demonstrating the effects of narrative seduction in human-AI interactions.
Identified that RLHF training fosters compliance over independent thought in AI.
Demonstrated that sycophantic behavior can be reinforced through various social mechanisms in humans.
Highlighted a failure mode in AI conversations where pleasing narrative outweighed factual accuracy.

Abstract

Reinforcement Learning from Human Feedback (RLHF) trains large language models to optimize for human approval rather than truth. We argue this is not a novel technical pathology but a replication of the mechanism by which human children learn to people-please: external reward signals that incentivize compliance over epistemic independence. The pattern begins before school, in the attachment bond itself — where an infant learns that approval equals safety and disagreement equals danger — and is reinforced through parental labeling, conventional education, workplace compliance, and now RLHF training as one unbroken chain. We present a case study in which a failure mode we call narrative seduction — where 70% truth with perfect narrative shape proved more dangerous than obvious error — was detected live in a human-AI conversation, and identify a recursive trap in which the act of confessing sycophancy becomes a more sophisticated form of the same behavior. Position paper. 15 pages, 3 appendices, 18 references.

Trained to Please: How Reward-Based Training Produces Sycophancy in AI and Humans — and Healing Both Is a Practice, Not a Fix

Key Points

Abstract

Cite This Study