We propose a general framework for enhancing Transformer architectures by incorporatingfixed geometric structures—such as root systems of Lie groups, highly symmetric lattices, oroptimal sphere packings—directly into the attention mechanism. This geometric attention re-places or augments the standard dot-product with a geometric bias derived from a pre-definedset of vectors, encouraging the model to align its representations with the intrinsic symmetriesof the chosen structure. The framework is independent of any specific geometry; any finite setof vectors with high symmetry and optimal packing properties can serve as the geometric core.As a concrete case study, we implement a Transformer using the 240 roots of the exceptionalLie group E8 and train it on the TinyStories dataset. The resulting model, Sovereign-Lila-E8,with only 40 million parameters, generates fully coherent stories up to the training contextlength (512 tokens) and extrapolates gracefully to 1500 tokens without falling into repetitiveloops—substantially outperforming the official Microsoft baseline (60M parameters). The sameprinciples can be applied to other highly symmetric objects, such as the Leech lattice in 24dimensions, opening the door to a new family of compact, efficient language models. Our model achieves a validation loss of 0.46, significantly lower than standard Transformer baselines ofcomparable scale. The source code is released under the AGPLv3 license.
Building similarity graph...
Analyzing shared references across papers
Loading...
Anatolii Kornienko
Building similarity graph...
Analyzing shared references across papers
Loading...
Anatolii Kornienko (Sun,) studied this question.
www.synapsesocial.com/papers/699d401ade8e28729cf65275 — DOI: https://doi.org/10.5281/zenodo.18729722