Multi Modal Fusion for Video Retrieval based on CLIP Guide Feature Alignment | Synapse