Improving Generalization Performance by Switching from Adam to SGD | Synapse