This paper presents a multimodal sensing approach for fine-grained soccer action recognition using synchronized mm-wave FMCW radar and multiview RGB cameras. A TI IWR1443BOOST FMCW radar and three Sony IMX296 global-shutter cameras were used to record seven soccer-related actions in different movement directions in an outdoor environment. Range–Doppler radar processing is applied to extract global mel features and CFAR-localized block representations of mel and radar spectrogram features to capture both coarse and fine micro-Doppler characteristics. Camera features are derived from bounding box, HOG, optical flow, and pose estimations. Classification is performed using logistic regression as the classical model and various deep models. Performance is evaluated using cross-validation. Radar alone achieved moderate performance (0.897 F1macro using TCN), successfully identifying coarse motion but showing limited separability for dribbling-based actions. Camera-only models achieve near-perfect accuracy (≥0.997 F1macro using 1D-CNN), with the confusion matrices being nearly perfectly diagonal already. The best performance is obtained from a cross-modal transformer with multiple cameras (0.998 F1macro). These results demonstrate that a camera by itself performs strongly for the action recognition task but also that radar–camera fusion can improve robustness and enhance the discrimination of finer soccer player movements for outdoor analytics and player monitoring applications.
Keyter et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: