Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning | Synapse