On Pursuit of Designing Multi-modal Transformer for Video Grounding | Synapse