Weakly-Supervised Visual Grounding of Phrases with Linguistic Structures | Synapse