A dominant assumption in large language model (LLM) research is that reliability is primarily a function of model capability, improved through scaling, data, and alignment. In this position paper, we argue that this framing is incomplete. Empirically, LLM behavior remains highly sensitive to prompt structure, context ordering, and interaction history, suggesting that reliability is not solely determined by model capacity, but by how models are controlled at inference time. We propose that modern LLM systems implicitly rely on an unmodeled computational layer, which we term inference-time control. This layer governs task framing, context selection, decision structure, and output constraints, yet is currently embedded in ad hoc prompting practices. We argue that many observed failure modes, including instability, context drift, and inconsistent constraint adherence, arise from under-specified control rather than insufficient capability. We introduce CogniConsole as a proof-of-concept instantiation of this layer, demonstrating how inference-time control can be externalized into a structured interface combining programmatic coordination with bounded prompt-based reasoning. Through controllability-oriented probes in a multi-step interactive environment, we show that increasing prompt structure, ranging from unstructured to semi-structured to fully scaffolded, reduces output variance and failure rates under a fixed inference-time control architecture. These results support a shift from model-centric to control-centric explanations of LLM behavior. We argue that inference-time control should be treated as a first-class abstraction, opening a new direction for designing, analyzing, and evaluating language model systems beyond scaling alone.
Figueiredo et al. (Thu,) studied this question.