Integrating 3D convolutional neural networks and transformer for video action recognition | Synapse