Learning Temporal Co-Attention Models for Unsupervised Video Action Localization | Synapse