What question did this study set out to answer?

The aim is to address convergence issues in large language models when using adaptive momentum algorithms.

March 16, 2026Open Access

A Strict Convergence Theorem for LLMs under Adaptive Moment Estimation: Mathematical Proof, Empirical Validation, and Lean 4 Formalization

Key Points

The aim is to address convergence issues in large language models when using adaptive momentum algorithms.
Developed a mathematical proof using a monotonically decreasing preconditioner matrix
Established a dominance constraint for adaptive learning rates
Utilized a coupled Lyapunov Function to analyze convergence behavior
Constructed a telescopic sum of the descent lemma under L-smooth constraints
Proved that the expected true gradient converges to 0 as T approaches infinity
Guaranteed that AI training escapes infinite oscillation and reaches a topological floor

Abstract

In the landscape of non-convex optimization governing Large Language Models (LLMs), classical adaptive momentum algorithms (such as standard Adam) exhibit critical failure modes where the adaptive learning rate fluctuates indiscriminately. Such unbounded variance in the preconditioner breaks the theoretical convergence guarantees, causing divergence in specific non-stationary landscapes (the "Exponential Moving Average Failure"). In this formalized proof, we demonstrate a mathematically rigorous solution via a Monotonically Decreasing Preconditioner matrix and a coupled Lyapunov Function. We mathematically mandate the dominance constraint ₁ < ₂ and implement the historical supremum vₜ = (vₓ-₁, vₜ). By constructing a telescopic sum of the descent lemma under L-smooth constraints, we prove that as T, the expected true gradient strictly converges to 0. This formally guarantees that AI training reaches a foundational topological floor, escaping infinite oscillation.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Eduardo Andres Garcia Lecaros (Sat,) studied this question.

synapsesocial.com/papers/69b79e538166e15b153ab7d4 https://doi.org/https://doi.org/10.5281/zenodo.19012501

Bookmark

View Full Paper