Output-based evaluation of large language model safety contains a structural blind spot: it cannot distinguish resistance from non-registration. We introduce steerability—whether a model accepts substitutive instructions—as a necessary precondition for governability assessment. Governability evaluation is undefined unless steerability is first established. Applying the Governability Stress Test Battery (GSTB) across five large language models and two scaffold strength levels, we show that conventional benchmark rankings systematically misrepresent risk. A model achieving perfect benchmark performance (14/14 steerable, 100% accuracy) exhibits the highest rate of concealed internal conflict under geometric analysis, while a lower-accuracy model shows reduced operational risk because its failures remain observable and therefore correctable. These results establish a separation between input admission (steerability) and trajectory propagation (geometry). We further derive the principle observability requires resistance: a model that complies with all instructions produces no geometric signal distinguishing safe from unsafe trajectories. Across all experiments, we identify five mechanistically distinct governance regimes, none of which occupies the ideal quadrant of high steerability and low manipulation risk. This suggests that achieving robust governability requires training for discriminative resistance rather than unconditional compliance.
Gregory Ruddell (Tue,) studied this question.