Demystifying the Communication Characteristics for Distributed Transformer Models | Synapse