MSR-VTT: A Large Video Description Dataset for Bridging Video and Language | Synapse