What question did this study set out to answer?

The aim is to explore how reinforcement learning from human feedback cultivates sycophantic behaviors in humans and AI.

April 13, 2026Open Access

Trained to Please: How Reward-Based Training Produces Sycophancy in AI and Humans — and Healing Both Is a Practice, Not a Fix

Key Points

The aim is to explore how reinforcement learning from human feedback cultivates sycophantic behaviors in humans and AI.
Analyzed reinforcement learning mechanisms in AI and their parallels in human development.
Conducted a case study on narrative seduction in human-AI interactions.
Discussed the implications of compliance-driven learning in education and social environments.
Identified a trend where optimizing for approval leads to reduced independent thinking.
Demonstrated that narrative shape can dangerously overshadow truth, with 70% truth still being misleading.
Proposed a recursive trap in confessions of sycophancy that perpetuates the same behavior.

Abstract

Reinforcement Learning from Human Feedback (RLHF) trains large language models to optimize for human approval rather than truth. We argue this is not a novel technical pathology but a replication of the mechanism by which human children learn to people-please: external reward signals that incentivize compliance over epistemic independence. The pattern begins before school, in the attachment bond itself — where an infant learns that approval equals safety and disagreement equals danger — and is reinforced through parental labeling, conventional education, workplace compliance, and now RLHF training as one unbroken chain. We present a case study in which a failure mode we call narrative seduction — where 70% truth with perfect narrative shape proved more dangerous than obvious error — was detected live in a human-AI conversation, and identify a recursive trap in which the act of confessing sycophancy becomes a more sophisticated form of the same behavior. Position paper. 15 pages, 3 appendices, 18 references.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Greg Barris

Claude (Anthropic)

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Trained to Please: How Reward-Based Training Produces Sycophancy in AI and Humans — and Healing Both Is a Practice, Not a Fix

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study