Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning | Synapse