We present a unified framework for transformer interpretability and safetygrounded in the geometry of residual stream operators — inter-layer differ-ences ∆l = hl+1 − hl that directly capture what each layer contributes tothe forward pass. We make five empirical contributions validated across fourmodels spanning three architectural families and a 80× parameter range(GPT-2 117M through Qwen3.5-9B).
Sanskar Pandey (Fri,) studied this question.