Improving Cross-Modal Alignment with Synthetic Pairs for Text-Only Image Captioning | Synapse