Various frameworks for integrating image and video streams for spatiotemporal information learning employing 2D–3D residual networks for human action recognition | Synapse