Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal Attention | Synapse