ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs | Synapse