Efficient spatio-temporal modeling for human action recognition from RGB streams using unitary temporal encoding and adaptive consistency refinement | Synapse