Convolutional Neural Networks or Vision Transformers: Who Will Win the Race for Action Recognitions in Visual Data? | Synapse