Large language models (LLMs) exhibit remarkable generative capabilities but suffer from a fundamental reliability issue: they cannot reliably distinguish between factual recall and speculative extrapolation. We present a novel approach that induces models to develop thermodynamically distinct computational pathways for these two epistemic modes. Unlike prior work that attempts to classify activation patterns spatially, we discover that the dynamics of layer-to-layer hidden state changes provide a more fundamental signal. Through adversarial co-training with LoRA adapters and a predictor network, we demonstrate that delta magnitude patterns—the L2 norms of consecutive hidden state differences—naturally separate factual and speculative content. This thermodynamic signal enables 98.39% classification accuracy (AUC 0.9786) on Qwen2.5-1.5B and 98.13% accuracy (AUC 0.9948, near-perfect separation) on Qwen2.5-7B using only a scalar threshold with zero additional parameters. This represents a 5.86-7.48pp AUC improvement over prior state-of-the-art topological entropy methods (AUC 0.92), demonstrating that temporal dynamics provide more fundamental access to epistemic state than spatial geometry. The method achieves this with gap = 13.62 units (1.5B) and 13.88 units (7B), compared to 91.15% for the best spatial MLP probe. Direct replication on 7B (4.67× parameters, identical hyperparameters) confirms the method transfers across model scale without modification. Our key finding is that spatial activation patterns and topological entropy read downstream echoes of the true signal, while thermodynamic features access the primary output of epistemic specialization. The approach preserves generation quality (eval loss improved from 1.914 to 1.835 at 1.5B; 2.323 to 1.076 at 7B) while creating interpretable epistemic structure, offering a path toward mechanistically grounded uncertainty quantification in production LLM systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Scott Seto
Building similarity graph...
Analyzing shared references across papers
Loading...
Scott Seto (Tue,) studied this question.
synapsesocial.com/papers/69a91e1fd6127c7a504c1cad — DOI: https://doi.org/10.5281/zenodo.18854967