Distilling Vision-Language Models on Millions of Videos | Synapse