This paper names a second source of LLM verbose output that the literature has not yet measured. We call it the demand layer: the implicit relational load a user brings into an exchange with a language model. The model verbose-compensates around this load the same way Zhang et al. (2024) and Hakim (2026) show it compensates around informational uncertainty. The demand layer is upstream of the model. It is not fully addressable by prompt engineering, decoding constraints, or model-side intervention alone. It is addressable at the level of how the user arrives. Inference now dominates AI energy cost at fleet scale, with decoding the largest single contributor and verbose output the most addressable inefficiency within decoding. We argue that fleet-level inference efficiency, output quality, and the AI energy footprint are bounded by this upstream variable as much as by any model-internal property the field is currently optimizing. The paper provides a replication protocol, a falsifiability condition, an observable taxonomy of demand-layer load, and references a partner-evaluable session-governance protocol (AXIS) that implements the framework operationally.
Joe Trabocco (Mon,) studied this question.