Abstract Large language models (LLMs) require considerable computation and energy resources during training and deployment. While scaling laws for training have guided much recent progress, inference costs represent a significant and growing component of the overall resource burden, particularly for reasoning models. Existing compute-optimality characterizations that consider model size, dataset size and inference tokens in isolation or fixed combinations may overlook more efficient operating points. We introduce directed stochastic skill search (DS3), a general framework that represents inference as stochastic traversal over a learnt skill graph. From a simplified yet expressive instantiation, we derive closed-form expressions for task success and compute cost across a wide range of inference strategies—including chain-of-thought (CoT) and tree-of-thought (ToT)—enabling comparisons by task difficulty and model capability. We extend a prior graph framework of LLM training to include inference and bridge DS3 with empirical scaling laws. We theoretically recover observed patterns, including linear accuracy scaling with log-compute, variation in preferred inference strategies by task and capability, emergent behaviour elicited by reasoning despite parameter plateaus and both best-of-N and majority voting (MV) captured within one framework. By characterizing training-inference interdependencies, our framework deepens theoretical understanding and supports principled algorithmic design and resource allocation. This article is part of the discussion meeting issue ‘Bits, neurons and qubits for sustainable AI’.
Ellis‐Mohr et al. (Thu,) studied this question.