Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training | Synapse