1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs | Synapse