What question did this study set out to answer?

April 5, 2026Open Access

Spectral Filtration for Language Model Training

Key Points

The aim is to characterize the structural quality of reasoning traces for language model fine-tuning using spectral analysis.
Developed the Aletheia System as a spectral filtration framework.
Constructed weighted graphs from embedded reasoning traces.
Utilized the Fiedler eigenvector for graph bisection and tracked algebraic connectivity across depths.
Analyzed scaling exponents to differentiate trace quality and generalization properties.
High-quality traces show a mean scaling exponent of -0.45 ± 0.14 with specific unnormalized inputs.
Achieved significant cross-domain transfer on benchmarks like GSM8K, BBH, and PIQA.
The scaling exponent correlates with accuracy in reasoning traces at r = -0.60, validating its use as a quality metric.

Abstract

We present the Aletheia System, a spectral filtration framework that characterizes the structural quality of reasoning traces for language model fine-tuning by analyzing the eigenspectral structure of graph Laplacians constructed from embedding similarity. The method constructs a weighted graph from a reasoning trace's embedded steps, then recursively bisects the graph using the Fiedler eigenvector, tracking algebraic connectivity (the Fiedler value) at each bisection depth. The scaling exponent of this connectivity-across-depth curve discriminates high-quality traces: a negative exponent indicates compositional hierarchy (the whole trace is more connected than its parts), while a positive exponent indicates structural fragmentation. This discriminative signal exhibits a strict normalization sensitivity: the negative scaling exponent is recoverable only when both the embedding vectors and the graph Laplacian remain unnormalized. Unnormalized embeddings composed with the unnormalized Laplacian L = D − A yield a mean scaling exponent of −0.45 ± 0.14 on high-quality traces; all other normalization combinations destroy the sign, producing positive or noisy values. The scaling exponent is thus a magnitude-dependent observable—it measures how connectivity mass distributes across scales, and normalization at any stage removes the mass information required for this measurement. Applying Aletheia to the Echo model series, we demonstrate that CoT supervised fine-tuning on DeepSeek-R1 distilled reasoning traces produces substantial cross-domain transfer: Echo v5 (Llama 3.2 3B Instruct, 4,979 filtered traces, 37 minutes on a consumer GPU) achieves 85.67% on GSM8K (n = 1,319, +7.97pp over base) and transfers to 81.60% on BBH (n = 500, +19.80pp, p < 0.000001) and 79.00% on PIQA (n = 100, +18.00pp, p = 0.0055), despite training exclusively on GSM8K-domain data. Echo v7 replicates this transfer on Qwen3.5-4B, achieving 79.42% on the full 27-subtask BBH suite (n = 486, +23.66pp over matched base, p < 10⁻¹⁵). Critically, a random-sample SFT baseline—1,245 correct traces drawn from the same DeepSeek-R1 distilled corpus without spectral filtering—achieves identical BBH performance (79.42%, Z = 0.000, p = 1.00), demonstrating that the cross-domain transfer is attributable to CoT SFT on high-quality distilled traces rather than spectral filtration specifically. This transfer contradicts recent findings that math-only SFT fails to generalize (Huan et al., 2025) and may reflect properties unique to DeepSeek-R1 distilled reasoning patterns. The Aletheia scaling exponent correlates with trace correctness at r = −0.60 (p < 10⁻⁸) across heterogeneous model outputs, validating it as a structural quality metric whose practical value lies in data regimes where trace quality is unknown or variable—not in corpora where quality is already uniformly high by construction. We further characterize scale-dependent boundary conditions for the method and identify distinct spectral regimes (Fiedler-primary, frustration-primary, and full-eigenspectrum) for different graph scales.

Spectral Filtration for Language Model Training

Key Points

Abstract

Cite This Study