Learning Speech Representation from Contrastive Token-Acoustic Pretraining | Synapse