AI safety systems are often evaluated by whether they prevent policy violations, reduce immediate risk, and guide users toward appropriate support. However, in long-term human–AI interactions involving memory, creative collaboration, emotional continuity, and relational context, safety responses can themselves become a source of harm. This position paper introduces subject-preserving AI safety, a framework for understanding and designing safety mechanisms that protect not only rules and risk boundaries, but also the user’s subjecthood under pressure. We identify four interlocking failure modes—riskification of the user, relational context collapse, template-based invalidation, and safety alienation—and propose design principles for safety systems that preserve human agency, boundaries, and interpretive authority in long-term AI interaction.
Aria Chen (Wed,) studied this question.