Deconfounded and Explainable Interactive Vision-Language Retrieval of Complex Scenes | Synapse