Scalable Multi-grained Cross-modal Similarity Query with Interpretability | Synapse