What question did this study set out to answer?

This research aims to quantify how much token generation distributions deviate from randomness in large language models.

February 24, 2026Open Access

Entropic Deviation as a Measure of Systematic Non-Randomness in Large Language Model Token Generation

Key Points

This research aims to quantify how much token generation distributions deviate from randomness in large language models.
Introduced a metric called Entropic Deviation to evaluate model outputs.
Measured ED across three LLM architectures and four content domains.
Conducted eight falsification tests to assess randomness rejection.
Analyzed 7,200 generation traces under different temperature settings.
Six out of eight falsification tests strongly rejected the stochastic hypothesis (p < 0.01).
Consensus across architectures on temperature-dependent effects.
Found evidence of domain sensitivity and autoregressive persistence in token output.

Abstract

Large language models (LLMs) generate text by sampling from token probability distributions, yet the degree to which these distributions deviate from randomness remains underexplored. This paper introduces Entropic Deviation (ED)—a normalized information-theoretic metric quantifying the divergence of a model’s outputdistribution from uniform randomness at each generation step. We present a multi-architecture experimental framework that measures ED across three modelfamilies (Llama-3-8B, Phi-3-mini-4K, Mistral-7B), four content domains, and three temperature settings, yielding 7,200 generation traces.A pre-registered battery of eight falsification tests reveals that six of eight tests strongly reject the stochastic baseline hypothesis (p < 0.01), with cross-architecturalconsensus on temperature-dependent effects, autoregressive persistence, and domain sensitivity. These results provide evidence for systematic, structured nonrandomnessin token generation that transcends individual architectures.Note: These are preliminary findings. The current prompt set consists of stimuli that inherently elicit non-random responses (encyclopedic, narrative, and coderelatedcontent). A follow-up study incorporating prompts designed to elicit maximally random outputs (e.g., random string generation, dice rolls) is underway andwill be reported separately. The full implications of the observed non-randomness patterns can only be assessed once both prompt categories have been analyzed.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper