Transformer-Based Visual Grounding with Cross-Modality Interaction | Synapse