Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training | Synapse