Towards Cognition-Aligned Visual Language Models via Zero-Shot Instance Retrieval | Synapse