Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training | Synapse