Training-efficient video feature extraction for human-centric multimodal video understanding | Synapse