What question did this study set out to answer?

To develop a framework that enhances Transformer architectures by injecting geometric structures into the attention mechanism.

February 24, 2026Open Access

View Full Paper

Geometric Attention: A General Framework for Injecting Discrete Symmetries into Transformers via High-Dimensional Lattices.

AKAnatolii Kornienko

Key Points

To develop a framework that enhances Transformer architectures by injecting geometric structures into the attention mechanism.
Introduced a geometric attention framework that replaces standard dot-product with a geometric bias.
Utilized the 240 roots of the exceptional Lie group E8 in the Transformer model.
Trained the model, Sovereign-Lila-E8, on the TinyStories dataset.
Achieved a validation loss of 0.46, significantly better than standard Transformer models.
Demonstrated coherent story generation up to 1500 tokens without repetitiveness.
Outperformed Microsoft baseline model with 60 million parameters using only 40 million parameters.

Abstract

We propose a general framework for enhancing Transformer architectures by incorporatingfixed geometric structures—such as root systems of Lie groups, highly symmetric lattices, oroptimal sphere packings—directly into the attention mechanism. This geometric attention re-places or augments the standard dot-product with a geometric bias derived from a pre-definedset of vectors, encouraging the model to align its representations with the intrinsic symmetriesof the chosen structure. The framework is independent of any specific geometry; any finite setof vectors with high symmetry and optimal packing properties can serve as the geometric core.As a concrete case study, we implement a Transformer using the 240 roots of the exceptionalLie group E8 and train it on the TinyStories dataset. The resulting model, Sovereign-Lila-E8,with only 40 million parameters, generates fully coherent stories up to the training contextlength (512 tokens) and extrapolates gracefully to 1500 tokens without falling into repetitiveloops—substantially outperforming the official Microsoft baseline (60M parameters). The sameprinciples can be applied to other highly symmetric objects, such as the Leech lattice in 24dimensions, opening the door to a new family of compact, efficient language models. Our model achieves a validation loss of 0.46, significantly lower than standard Transformer baselines ofcomparable scale. The source code is released under the AGPLv3 license.

Ask AI

Helpful

Bookmark

View Full Paper

Ask AI

Helpful

Bookmark

View Full Paper

Geometric Attention: A General Framework for Injecting Discrete Symmetries into Transformers via High-Dimensional Lattices.

Key Points

Abstract

Cite This Study