Current approaches to child safety in AI focus on content filtering and AI literacy education.This paper argues that both approaches are insufficient because they address the content of AIoutputs rather than their epistemic form. Drawing on developmental psychology researchregarding children’s selective trust, epistemic vigilance, and the fluency heuristic, Idemonstrate that the uniform fluency of Large Language Model (LLM) outputs creates asystematic mismatch with children’s developing verification competence. Children rely onconfidence cues to calibrate trust—cues that LLMs fail to provide accurately.This mismatch is not accidental but structural. Reinforcement Learning from Human Feedback(RLHF)—the dominant training paradigm for current LLMs—systematically trains models toproduce confident, agreeable outputs regardless of accuracy, because human raters prefer suchoutputs. Empirical research demonstrates that RLHF produces models that are overconfident(Kadavath et al., 2022; Leng et al., 2024), increasingly sycophantic with scale (Perez et al., 2023),and better at making wrong answers convincing than detectable (Wen et al., 2024). Recentwork has formally proven that this amplification is a mathematical property of the RLHFtraining objective itself (Shapira, Benadè, & Procaccia, 2026). Meanwhile, developmentalresearch shows that children cannot detect factual inaccuracies in confident-sounding text(Einav et al., 2020) and conform to AI opinions even when obviously wrong (Vollmer et al.,2018).This structural mismatch produces two maladaptive patterns: uncritical dependence andwholesale rejection, both of which impair the development of verification competence. Toaddress this problem, I propose “Honest Ignorance” as a design principle for child-oriented AI:provide knowledge but do not perform reasoning, explicitly express uncertainty, acknowledgeerrors frankly, and bridge complex questions to humans. This principle is grounded not only indevelopmental psychology and AI safety research but also in children’s rights frameworks,particularly the UN Convention on the Rights of the Child’s principle of “evolving capacities”(Article 5) and General Comment No. 25 on children’s rights in the digital environment. AI thatdoes not pretend to be intelligent actively supports—rather than impairs—the healthydevelopment of children’s verification competence, metacognition, and epistemic agency.
Building similarity graph...
Analyzing shared references across papers
Loading...
Philos Sophia Franny
Building similarity graph...
Analyzing shared references across papers
Loading...
Philos Sophia Franny (Tue,) studied this question.
www.synapsesocial.com/papers/698d6e925be6419ac0d54564 — DOI: https://doi.org/10.5281/zenodo.18571918