Dual-Head Attention Enables Length Generalization in Transformer Multiplication | Synapse