In the landscape of non-convex optimization governing Large Language Models (LLMs), classical adaptive momentum algorithms (such as standard Adam) exhibit critical failure modes where the adaptive learning rate fluctuates indiscriminately. Such unbounded variance in the preconditioner breaks the theoretical convergence guarantees, causing divergence in specific non-stationary landscapes (the "Exponential Moving Average Failure"). In this formalized proof, we demonstrate a mathematically rigorous solution via a Monotonically Decreasing Preconditioner matrix and a coupled Lyapunov Function. We mathematically mandate the dominance constraint ₁ < ₂ and implement the historical supremum vₜ = (vₓ-₁, vₜ). By constructing a telescopic sum of the descent lemma under L-smooth constraints, we prove that as T, the expected true gradient strictly converges to 0. This formally guarantees that AI training reaches a foundational topological floor, escaping infinite oscillation.
Eduardo Andres Garcia Lecaros (Sat,) studied this question.