The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima | Synapse