What question did this study set out to answer?

This research aims to demonstrate that the traditional attention mechanism in language models is incompatible with intrinsic geometric structures.

June 5, 2026Open Access

Attention and Intrinsic Geometry are Structurally Incompatible: A Prescriptive Lagrangian Alternative for Language Modelling

Read Full Paperexternally

Key Points

This research aims to demonstrate that the traditional attention mechanism in language models is incompatible with intrinsic geometric structures.
Proved the Conservative Obstruction Theorem regarding scalar potential on token states.
Developed a second-order Lagrangian language model employing damped Euler-Lagrange flow.
Assessed model performance through perplexity on TinyStories and a shared-potential diagnostic.
The conservative architecture achieved constant-memory inference with a perplexity premium compared to standard attention.
Confirmed the predicted separation between traditional attention and Lagrangian dynamics.

Abstract

Hidden-state trajectories in large language models trace smooth, low-curvature paths, yet the attention mechanism that produces them admits no intrinsic notion of distance. Here we prove that this gap is structural, not empirical. Our Conservative Obstruction Theorem shows that no scalar potential on token states can reproduce three defining properties of scaled dot-product attention—asymmetric coupling, coupling–content decoupling, and a normalised influence budget—regardless of dynamical order. Standard attention therefore cannot itself be the generator of an intrinsic conservative Riemannian dynamics on token states; recent metrics extracted from transformers are necessarily descriptive overlays, not laws of motion. We then give the positive construction: a second-order Lagrangian language model whose inference is a damped Euler–Lagrange flow, equipping semantic space with an intrinsic Jacobi metric. A complementary Attention Optimality Conjecture pins attention and this geometric alternative to opposite corners of one design lattice. On TinyStories the conservative architecture achieves constant-memory inference at a perplexity premium over attention, and a shared-potential diagnostic confirms the predicted separation between the two dynamical families.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Dimitar Gueorguiev

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Attention and Intrinsic Geometry are Structurally Incompatible: A Prescriptive Lagrangian Alternative for Language Modelling

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study