What question did this study set out to answer?

The aim is to determine if transformer hidden states exhibit coherent phase structures and how to extract them effectively.

April 3, 2026Open Access

Geometric Phase Extraction from Transformer Hidden States: Architecture-Dependent Manifold Structure and Adaptive Observation Protocols

Key Points

The aim is to determine if transformer hidden states exhibit coherent phase structures and how to extract them effectively.
Tested phase extraction methods on GPT-2 and similar models.
Proposed a geometric extraction technique using PCA and atan2 for hidden states.
Developed an adaptive observation protocol based on PCA variance explained.
Geometric method extracted phase with R-bar of 0.93–0.98, significantly outperforming traditional Hilbert methods.
LayerNorm placement was identified as a critical variable affecting extraction efficacy.
Adaptive protocol for selecting extraction method based on variance explained (ρ₂) was established.

Abstract

Geometric Phase Extraction from Transformer Hidden States What this is Code and data for a paper that asks: do Transformer hidden states have coherent angular (phase-like) structure, and if so, how do you extract it? Short answer: yes, but only if you pick the right method for the right architecture. The problem The standard signal-processing approach to phase extraction — PCA, bandpass filter, Hilbert transform — assumes oscillatory dynamics. Transformers are feedforward, not recurrent. We tested this pipeline on GPT-2 and got R-bar ≈ 0. 12, which is indistinguishable from noise. The conventional approach simply doesn't work here. What we found A geometric method works. Project hidden states onto their first two principal components, compute the angle via atan2. On Pre-LayerNorm models (GPT-2, Qwen2, Pythia, most OPT variants), this gives R-bar = 0. 93–0. 98 — roughly 8x better than Hilbert. LayerNorm placement is the key variable. GPT-1 (Post-LN) and GPT-2 (Pre-LN) have nearly identical architectures (768-dim, 12 layers, ~120M params). The only real difference is where LayerNorm goes. PCA concentration at k=2: 16% vs 96%. That 6x gap is reproducible and shows up again in the OPT family (OPT-350m vs OPT-125m Pre-LN). For low-concentration models, a wide-bandpass Hilbert variant works as a fallback. Passband 0. 01, 0. 45 instead of the standard 0. 05, 0. 25. This gets R-bar = 0. 60–0. 94 across all nine models we tested, including OPT-1. 3B where the geometric method underperforms. You can pick the method automatically. Compute PCA variance explained at k=2 (we call it ρ₂). If ρ₂ > 0. 80, use geometric extraction. Otherwise, use wide-bandpass Hilbert. That's the whole protocol. What's in this repository Paper: LaTeX source and compiled PDF (23 pages, arXiv-formatted) 13 experiments (7 core + 6 supplementary), all as standalone Python scripts All generated figures (PNG) and raw data (JSON) for full reproducibility runₐll. py — single command to reproduce everything Models tested Nine models, 110M–2. 8B parameters: GPT-1, GPT-2, OPT-125m/350m/1. 3B/2. 7B, Qwen2-0. 5B/1. 5B, Pythia-2. 8B. All downloaded automatically from HuggingFace Hub. Reproducibility python3 -m venv. venv && source. venv/bin/activate pip install -r experiments/requirements. txt python experiments/runₐll. py Runs on consumer hardware. Tested on Apple M1, 16GB. Total runtime ~45 minutes. No GPU required. Why it matters For interpretability researchers: Pre-LN hidden states live on a ~2D manifold at middle layers. Angular position on that manifold is a new, unsupervised observable — no labeled data or probes needed. For practitioners: The three-tier architecture classification (Post-LN / Pre-LN OPT / Pre-LN non-OPT) has practical implications for compression and low-rank approximation strategies. For the multi-agent crowd: Phase coherence gives you a scalar, architecture-comparable quantity for monitoring alignment across LLM instances. Related papers This paper provides the theoretical foundation for the Recync framework — runtime coherence control for LLMs: From Monitoring to Intervention (detection + token-level control limits): doi. org/10. 5281/zenodo. 19148449 Beyond Micro-Control (response-level checkpoint restart breakthrough): doi. org/10. 5281/zenodo. 19148721 Code This repository: github. com/metaSATOKEN/geometricₚhaseₑxtraction Recync framework (Paper 2 & 3): github. com/metaSATOKEN/Recyncframework License Paper content: CC BY 4. 0 Code: Apache License 2. 0 Copyright 2026 Kentaro Sato.

Geometric Phase Extraction from Transformer Hidden States: Architecture-Dependent Manifold Structure and Adaptive Observation Protocols

Key Points

Abstract

Cite This Study