How Transformers Learn Causal Structure with Gradient Descent | Synapse