Cross-Modal Contrastive Learning for Text-to-Image Generation | Synapse