The companion core paper establishes, and machine-checks, a single identity: a transformer's forward pass can implement one gradient-descent step on an implicit least-squares objective (the ICL=GD mechanism). Maturity: Short Draft. Target venue: Transactions on Machine Learning Research (TMLR). Includes formal verification (Lean 4 with Python verification scripts). Part of The Latent research program. Related papers in this program: ML In Context Gradient Descent.
Tamás Nagy (Mon,) studied this question.