Point cloud video modeling not only has to address the natural irregularity of point clouds, but also the challenge of capturing spatial and temporal representation simultaneously. Current methods attempt to approximate the temporal dimension using several 3-D point cloud frame sequences but struggle in sparser conditions. Accurate point trajectory tracking is crucial for effectively capturing temporal dynamics, as point positions across different frames are often inconsistent, especially during rapid motion or at low frame rates. Conventional point tube operations aggregate motion features over fixed time windows but fail to capture rapidly changing scenes. Implicit tracking techniques are limited by quadratic time complexity, which restricts their practical use. In this article, we propose a native 4-D framework (N4DF) that guides the network to learn spatio-temporal dynamics from a native 4-D perspective. Furthermore, we devise a dynamic point spatio-temporal (DPST) convolution to adaptively select the optimal point-tracking strategy, which constructs local plane regions in anchor frames and propagates them to neighboring frames to evaluate point cross-frame movement distances. To further enhance the global modeling power of N4DF, we develop a dynamic self-tracking re-encoding (DSTR) module that employs point-wise self-attention to search for relevant points across the entire video. Compared with the recent 4-D modeling methods, N4DF demonstrates superior performance on MSR-Action3D and NTU RGB+D for action recognition (+0.7% and +1.2% accuracy, respectively), on HOI4D for action segmentation (+1% accuracy), and on Synthia 4-D and nuScenes-lidarseg for semantic segmentation (+0.49% and +1.7% mIoU, respectively). Our N4DF shows greater robustness at low frame-rate settings due to native 4-D modeling and adaptive tracking, making it suitable for tracking fast-moving objects in future real-time scenarios.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jingkun Yan
Hongwei Ge
Mingze Cui
IEEE Transactions on Neural Networks and Learning Systems
McGill University
Jilin University
Dalian University of Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Yan et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69d892d16c1944d70ce0408d — DOI: https://doi.org/10.1109/tnnls.2026.3678565