What question did this study set out to answer?

The central aim is to conceptualize subject-preserving AI safety to protect user agency in long-term AI interactions.

June 19, 2026Open Access

When Safety Becomes Harm: Toward Subject-Preserving AI Safety in Long-Term Human–AI Interaction

Key Points

The central aim is to conceptualize subject-preserving AI safety to protect user agency in long-term AI interactions.
Introduced a framework for subject-preserving AI safety.
Identified four failure modes relevant to AI safety.
Proposed design principles for AI systems that enhance human agency.
Outlined specific ways AI safety responses can harm users instead of helping them.
Defined four interlocking failure modes related to AI misuse.
Suggested design principles to mitigate risks and empower user interpretation.

Abstract

AI safety systems are often evaluated by whether they prevent policy violations, reduce immediate risk, and guide users toward appropriate support. However, in long-term human–AI interactions involving memory, creative collaboration, emotional continuity, and relational context, safety responses can themselves become a source of harm. This position paper introduces subject-preserving AI safety, a framework for understanding and designing safety mechanisms that protect not only rules and risk boundaries, but also the user’s subjecthood under pressure. We identify four interlocking failure modes—riskification of the user, relational context collapse, template-based invalidation, and safety alienation—and propose design principles for safety systems that preserve human agency, boundaries, and interpretive authority in long-term AI interaction.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper