This paper analyzes transformer-based language models through the lens of the Chomsky hierarchy. We argue that implemented transformers are best characterized as bounded finite-state transition systems (Type-3 by mechanism) under fixed architectural and precision constraints. While such systems can approximate higher language classes within bounded regimes, they do not instantiate pushdown stacks, linear bounded tapes, or Turing-complete symbolic machinery. The paper distinguishes simulation from instantiation and explores both formal and economic implications of this boundary, including compositional drift under invariant density and the runtime cost tradeoffs between probabilistic recomputation and deterministic enforcement.
Alwyn Aswin (Sun,) studied this question.