What question did this study set out to answer?

To develop a language model that utilizes the E8 Lie algebra for enhanced attention in story generation.

February 24, 2026Open Access

A Geometric Attention Transformer with the E8 Root System: Sovereign-Lila-E8 (Lie Lattice Attention Language Model)

Key Points

To develop a language model that utilizes the E8 Lie algebra for enhanced attention in story generation.
Introduced Sovereign-Lila-E8 model with E8 root system in attention mechanism.
Softly quantized hidden states into 240 roots of E8.
Trained on the TinyStories dataset with 40 million parameters.
Model generates coherent stories up to 512 tokens and extends to 1500 tokens gracefully.
Achieved validation loss of 0.46–0.6, surpassing standard Transformer baselines.
Compared to a baseline model, it avoids repetitive loops beyond 300–500 tokens.

Abstract

We introduce Sovereign-Lila-E8 (Lie Lattice Attention Language Model), a Trans-former architecture that incorporates the root system of the exceptional Lie algebraE8 into the attention mechanism. By softly quantizing hidden states into the 240roots of E8 and adding geometric biases to attention scores, the model achieves densesemantic packing and improved long-context coherence. Trained on the TinyStoriesdataset with only 40 million parameters, our model generates coherent stories up to 512tokens—the full training length—and extrapolates gracefully to 1500 tokens withoutfalling into repetitive loops. In contrast, a comparable baseline (Microsoft’s 33M/60Mmodel) exhibits hard loops after 300–500 tokens. We provide mathematical details,experimental results, and qualitative examples. Our model achieves a validation loss of0.46–0.6, significantly lower than standard Transformer baselines of comparable scale.

Ask AI

Helpful

Bookmark

View Full Paper