We turn the gradient-descent account of in-context learning (ICL) into machine-checked mathematics and falsifiable predictions about real transformers. The formal target is the linear-attention regression identity: a forward pass can implement one gradient-descent step on an implicit least-squares objective. Maturity: Draft. Target venue: Transactions on Machine Learning Research (TMLR). Includes formal verification (Lean 4 with Python verification scripts). Part of The Latent research program. Related papers in this program: ML Spectral Capacity Bound, Sgd, Universal.
Tamás Nagy (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: