Leveraging Vision-Language Large Models for Interpretable Video Action Recognition with Semantic Tokenization | Synapse