Large language models (LLMs) generate text by sampling from token probability distributions, yet the degree to which these distributions deviate from randomness remains underexplored. This paper introduces Entropic Deviation (ED)—a normalized information-theoretic metric quantifying the divergence of a model’s outputdistribution from uniform randomness at each generation step. We present a multi-architecture experimental framework that measures ED across three modelfamilies (Llama-3-8B, Phi-3-mini-4K, Mistral-7B), four content domains, and three temperature settings, yielding 7,200 generation traces.A pre-registered battery of eight falsification tests reveals that six of eight tests strongly reject the stochastic baseline hypothesis (p < 0.01), with cross-architecturalconsensus on temperature-dependent effects, autoregressive persistence, and domain sensitivity. These results provide evidence for systematic, structured nonrandomnessin token generation that transcends individual architectures.Note: These are preliminary findings. The current prompt set consists of stimuli that inherently elicit non-random responses (encyclopedic, narrative, and coderelatedcontent). A follow-up study incorporating prompts designed to elicit maximally random outputs (e.g., random string generation, dice rolls) is underway andwill be reported separately. The full implications of the observed non-randomness patterns can only be assessed once both prompt categories have been analyzed.
Jarosław Hryszko (Sun,) studied this question.