What question did this study set out to answer?

The study explores the implications of a verified identity related to transformers and their ability to perform gradient descent in the context of representational capacity and scaling.

June 17, 2026Open Access

Capacity, Scaling, and Grokking from the In-Context Learning = Gradient Descent Mechanism

Key Points

The study explores the implications of a verified identity related to transformers and their ability to perform gradient descent in the context of representational capacity and scaling.
Formal verification using Lean 4 and Python verification scripts
Analysis of the transformer's forward pass as it relates to the ICL=GD mechanism
Examination of related research papers under The Latent research program.
Identity verified that a transformer's forward pass implements a gradient-descent step for an implicit least-squares objective.
Insights on representational capacity reveal constraints on scaling for transformer models.

Abstract

The companion core paper establishes, and machine-checks, a single identity: a transformer's forward pass can implement one gradient-descent step on an implicit least-squares objective (the ICL=GD mechanism). This satellite asks what that verified identity forces to be true about representational capacity and scaling. Maturity: Short Draft. Target venue: Transactions on Machine Learning Research (TMLR). Includes formal verification (Lean 4 with Python verification scripts). Part of The Latent research program. Related papers in this program: ML In Context Gradient Descent, ML Scaling Laws Latent, ML Spectral Capacity Bound, Universal.

Capacity, Scaling, and Grokking from the In-Context Learning = Gradient Descent Mechanism

Key Points

Abstract

Cite This Study