March 3, 2026

A Transformer-Based Multimodal Framework for Tactical Decision Analysis in Team Sports with Application to Football

Key Points

The proposed model enhances tactical decision-making accuracy by 28%, demonstrating its effectiveness in team sports.
It reduces misclassification due to pressure by 41% with a low inference latency of 52.6 ms, suitable for real-time applications.
Experiments utilized a dataset of 500 multimodal sequences derived from match simulations and recorded games for model training and testing.
Findings underscore the need for broader validation across diverse player demographics and tactical contexts.

Abstract

Rapid and accurate tactical decision-making is required in modern football. However, traditional approaches for the assessment of tactical decision-making have poor ecological validity and do not easily scale. In order to address these challenges, this article introduces a Transformer-Based Multimodal Fusion model that incorporates player positioning, video, audio, and contextual metadata to classify real-time tactical decisions. Experiments were conducted with 40 male players and a dataset comprising 500 sequences of multimodal plays. The 40-player dataset refers to controlled laboratory-style decision-making experiments used for initial validation and a reliability assessment. Then, 500 multimodal sequences were extracted from extended match simulations and real-game recordings to provide the larger dataset used in training and testing the multimodal transformer model. It processes inputs in five stages: data acquisition, preprocessing, feature extraction, transformer-based fusion, and decision classification. Compared to the baselines of CNN-LSTM, BiLSTM-Attention, and GNN, the proposed approach improves the accuracy of decision prediction by 28% and reduces misclassification caused by pressure by 41%, with low inference latency of 52.6 ms, making it suitable for near-real-time applications. The generalizability of findings across more diverse tactical contexts and to wider athlete demographics is also limited by the relatively small size and homogeneity within the sample population of young male players from a single region. These results emphasize the contribution of transformer-based multimodal fusion toward automated tactical decision analysis and point out the need for its further validation in more diverse and large-scale match situations.

Bookmark

A Transformer-Based Multimodal Framework for Tactical Decision Analysis in Team Sports with Application to Football

Key Points

Abstract

Cite This Study