Capacity, Scaling, and Grokking from the In-Context Learning = Gradient Descent Mechanism | Synapse