This paper introduces a novel 3D symbolic action representation framework for human motion understanding, addressing key limitations of existing 2D skeletal-based methods. By leveraging symbolic programming in 3D space, we construct motion primitives through cubic spline fitting of normalized 3D joint trajectories, effectively capturing spatiotemporal dynamics while preserving viewpoint invariance. Our approach integrates a robust multi-feature action segmentation mechanism that combines kinematic and geometric cues to detect atomic action units accurately. Furthermore, we adopt a transformer-based architecture for temporal action localization, enabling effective modeling of long-range dependencies and semantic concept learning. Extensive experiments on augmented exercise datasets demonstrate that our method significantly outperforms state-of-the-art approaches in segmentation precision, representation compactness, and localization accuracy, while reducing annotation dependency through semi-supervised learning. The proposed framework offers a discriminative, efficient, and interpretable paradigm for symbolic human motion analysis.
Lu et al. (Fri,) studied this question.