Training instabilities favor flatter solutions in gradient descent | Synapse