This paper presents a unified geometric framework for understanding the working mechanisms of large language models. The core thesis is: language is not a sequence of positions, but a sequence of transformations. We reinterpret the word embedding space asfibers of a principal bundle, attention mechanisms as frame transformations, and languagegeneration as probabilistic path sampling of difference vectors. Seven tests (synthetic data,GloVe, BERT) validate the core propositions: difference vectors have low-dimensional structure (2-5 dimensions), different syntactic relations correspond to different subspaces, localsections can glue under compatibility conditions, and non-linearity emerges from twistedgluing. The paper also elucidates backpropagation as a feature extractor that discoversbasic concepts from human cognition, and unifies induction and inference as two directionsof the same probabilistic mechanism, encoding and decoding as bidirectional mappings, andtranslation as different walking orders on the same geometric path. A suite of visualization tools is developed, offering a geometric perspective for understanding, debugging, andvalidating large language models.
Xiaobo Li (Tue,) studied this question.