Key points are not available for this paper at this time.
3D skeleton-based human activity recognition has gained significant attention due to its robustness against variations in background, lighting, and viewpoints. However, challenges remain in effectively capturing spatiotemporal dynamics and integrating complementary information from multiple data modalities, such as RGB video and skeletal data. To address these challenges, we propose a multimodal fusion framework that leverages optical flow-based key frame extraction, data augmentation techniques, and an innovative fusion of skeletal and RGB streams using self-attention and skeletal attention modules. The model employs a late fusion strategy to combine skeletal and RGB features, allowing for more effective capture of spatial and temporal dependencies. Extensive experiments on benchmark datasets, including NTU RGB+D, SYSU, and UTD-MHAD, demonstrate that our method outperforms existing models. This work not only enhances action recognition accuracy but also provides a robust foundation for future multimodal integration and real-time applications in diverse fields such as surveillance and healthcare.
Building similarity graph...
Analyzing shared references across papers
Loading...
Dongwei Xie
Nantong University
Xiaodan Zhang
Beijing University of Posts and Telecommunications
Xiang Gao
Wenzhou Medical University
PLoS ONE
Zhongkai University of Agriculture and Engineering
Guangdong Polytechnic Normal University
Guangdong Police College
Building similarity graph...
Analyzing shared references across papers
Loading...
Xie et al. (Wed,) studied this question.
synapsesocial.com/papers/69d9fd7d84371aa676a3c5df — DOI: https://doi.org/10.1371/journal.pone.0319656