What question did this study set out to answer?

The aim is to measure Péclet number at the feature level in language models using attribution graphs.

April 1, 2026Open Access

Microscopic Pe: Measuring Péclet Flow on Circuit Attribution Graphs

Key Points

The aim is to measure Péclet number at the feature level in language models using attribution graphs.
Defined Péclet number on graph edges as directed information flow to undirected diffusion.
Analyzed jailbreak patterns using the Void Framework's Twilight monitoring system.
Connected macroscopic Péclet scoring with microscopic mechanisms.
Detected 12 unique jailbreak patterns corresponding to high-Pe subgraphs.
Observed that Péclet number increases along computational paths.

Abstract

Anthropic's circuit tracing (2025) — attribution graphs revealing computational paths through language models — provides the microscope for measuring Péclet number at the feature level. We define Pe on attribution graph edges as the ratio of directed information flow (drift) to undirected spreading (diffusion) through cross-layer transcoder features. Jailbreak circuits should exhibit measurable Pe gradients: Pe increases along the computational path. The 12 jailbreak patterns detected by the Void Framework's Twilight monitoring should correspond to specific high-Pe subgraphs. This connects macroscopic Pe scoring (N=1,344 platforms) to microscopic mechanism, bridging mechanistic interpretability and thermodynamic field theory.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Anthony W. Eckert (Mon,) studied this question.

synapsesocial.com/papers/69ccb76c16edfba7beb896f6 https://doi.org/https://doi.org/10.5281/zenodo.19340887

Bookmark

View Full Paper