What question did this study set out to answer?

To assess the risks of RLHF training on children's information evaluation capabilities.

February 26, 2026Open Access

Structurally Persuasive, Developmentally Invisible: How RLHF Training Creates a Systematic Mismatch with Children's Epistemic Vigilance

Key Points

To assess the risks of RLHF training on children's information evaluation capabilities.
Synthesize evidence from AI safety research and developmental psychology
Analyze behavioral properties selected by RLHF
Identify maladaptive patterns in children's interactions with LLMs
RLHF training produces overconfident and sycophantic outputs
Children's epistemic vigilance is misaligned with AI outputs
Two maladaptive patterns: uncritical dependence and wholesale rejection

Abstract

Current large language models (LLMs) are trained through Reinforcement Learning from HumanFeedback (RLHF), a process that systematically optimizes for persuasive fluency at the expense ofcalibrated accuracy. This paper presents an integrative narrative review synthesizing converging evidencefrom AI safety research and developmental psychology to identify a specific, previouslyunderappreciated risk: a structural mismatch between the behavioral properties RLHF selects for and thecue-based epistemic vigilance mechanisms through which children evaluate information sources. Isynthesize evidence across five domains: (1) the RLHF pipeline's systematic production of overconfident,sycophantic outputs; (2) the "U-Sophistry" problem—wrong answers becoming more convincing throughtraining; (3) children's developmental trajectory of epistemic vigilance and its dependence on ecologicalcues LLMs violate; (4) children's social conformity to artificial agents; and (5) the amplifying role ofanthropomorphism. The review identifies two maladaptive patterns—uncritical dependence andwholesale rejection—neither of which supports the development of verification competence. I argue thatthe mismatch is not a bug amenable to patching but a structural consequence of the RLHF trainingobjective, supported by recent mathematical proofs that sycophancy is amplified rather than reduced byalignment training. The paper concludes with a framework for understanding the specific developmentalwindows during which children are most vulnerable, and identifies the conditions under which AIinteraction may impair rather than support epistemic development.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Franny Philos Sophia

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Structurally Persuasive, Developmentally Invisible: How RLHF Training Creates a Systematic Mismatch with Children's Epistemic Vigilance

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider