Learning a Text-Video Embedding from Incomplete and Heterogeneous Data | Synapse