What question did this study set out to answer?

The research aims to mathematically formalize the gradient-descent mechanism in in-context learning for transformers.

June 17, 2026Open Access

When In-Context Learning Implements Gradient Descent: A Learned Mechanism, Mechanically Verified and Empirically Tested

Key Points

The research aims to mathematically formalize the gradient-descent mechanism in in-context learning for transformers.
Formal verification using Lean 4 and Python verification scripts.
Derivation of linear-attention regression identity for gradient-descent steps.
Developing falsifiable predictions about transformer performance.
Proposes that one forward pass equates to a single gradient-descent step on a least-squares objective.

Abstract

We turn the gradient-descent account of in-context learning (ICL) into machine-checked mathematics and falsifiable predictions about real transformers. The formal target is the linear-attention regression identity: a forward pass can implement one gradient-descent step on an implicit least-squares objective. Maturity: Draft. Target venue: Transactions on Machine Learning Research (TMLR). Includes formal verification (Lean 4 with Python verification scripts). Part of The Latent research program. Related papers in this program: ML Spectral Capacity Bound, Sgd, Universal.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Tamás Nagy (Mon,) studied this question.

synapsesocial.com/papers/6a323b21d50b63ecad205cae https://doi.org/https://doi.org/10.5281/zenodo.20708733

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper