This work investigates AI inference as a dynamic process influenced by system conditions, rather than a fixed execution pathway. It introduces an experimental framework for observing how generation behaviour evolves under constrained compute environments, where latency, resource contention, and limited throughput expose otherwise transient instability patterns. The study is conducted using the Elora runtime system, a control-plane architecture that applies bounded parameter adjustments prior to generation and captures telemetry during execution for analysis. Runtime signals are derived from heuristic proxies over prompt structure, session context, retrieval confidence, and sequence-level behaviour, enabling the detection of patterns such as degeneracy, expansion, and drift. Experiments are performed under deliberately constrained conditions using CPU-based inference, shared system resources, and scenario-based workloads. This setup increases observability of behavioural transitions that are typically difficult to capture in high-performance environments. Across approximately 5.8 million tokens and over 100 hours of runtime, results show that: inference instability follows structured and observable patterns baseline generation may exhibit abrupt collapse under pressure bounded pre-inference adjustment is associated with more stable, lower-amplitude behavioural variation The findings suggest that a portion of inference inefficiency is not random, but arises from identifiable behavioural dynamics that can be observed and influenced through lightweight control mechanisms. This work does not present a fully optimised system or causal proof of improvement. Instead, it establishes a foundation for studying inference behaviour as an observable and partially controllable process, with implications for improving stability, efficiency, and system-level understanding of AI execution. This work is released as a preprint and has not undergone formal peer review.
Nathan E. J. Freestone (Wed,) studied this question.