What question did this study set out to answer?

This research aims to propose a more fundamental method for defining AI safety red lines.

April 1, 2026Open Access

PRISM Risk Signal Framework: Hierarchy-Based Red Lines for AI Behavioral Risk

Key Points

This research aims to propose a more fundamental method for defining AI safety red lines.
Developed the PRISM framework to create a taxonomy of 27 behavioral risk signals
Evaluated each signal using a dual-threshold principle for classification
Analyzed approximately 397,000 responses from 7 AI models
Established a two-tier risk classification: Confirmed Risk and Watch Signal
Demonstrated the framework's capacity to detect structural anomalies across AI models
Highlighted the advantages of a hierarchy-based approach in AI safety

Abstract

Current approaches to AI safety define red lines at the case level: specific prompts, specific outputs, specific harms. This paper argues that red lines can be set more fundamentally—at the level of value, evidence, and source hierarchies that govern AI reasoning. Using the PRISM (Profile-based Reasoning Integrity Stack Measurement) framework, we define a taxonomy of 27 behavioral risk signals derived from structural anomalies in how AI systems prioritize values (L4), weight evidence types (L3), and trust information sources (L2). Each signal is evaluated through a dual-threshold principle combining absolute rank position and relative win-rate gap, producing a two-tier classification (Confirmed Risk vs. Watch Signal). The hierarchy-based approach offers three advantages over case-specific red lines: it is anticipatory rather than reactive, comprehensive rather than enumerative, and measurable rather than subjective. We demonstrate the framework's detection capacity using approximately 397,000 forced-choice responses from 7 AI models across three Authority Stack layers.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Seulki Lee (Mon,) studied this question.

synapsesocial.com/papers/69ccb72e16edfba7beb8914a https://doi.org/https://doi.org/10.5281/zenodo.19327191

Bookmark

View Full Paper