Self-Supervised Spatio-Temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics | Synapse