We present a unified framework for transformer interpretability and safetygrounded in the geometry of residual stream operators — inter-layer differ-ences ∆l = hl+1 − hl that directly capture what each layer contributes tothe forward pass. We make five empirical contributions validated across fourmodels spanning three architectural families and a 80× parameter range(GPT-2 117M through Qwen3.5-9B).
Building similarity graph...
Analyzing shared references across papers
Loading...
Sanskar Pandey
Building similarity graph...
Analyzing shared references across papers
Loading...
Sanskar Pandey (Fri,) studied this question.
www.synapsesocial.com/papers/69bf393dc7b3c90b18b43bb2 — DOI: https://doi.org/10.5281/zenodo.19135348