ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation | Synapse