March 3, 2026Open Access

R-DMRF-HPE: Robust Dynamic Multi-modal Radar-vision Fusion for Human Pose Estimation

Key Points

Our method achieves an average MPJPE of 62.47 mm on the mmBody dataset, highlighting its effectiveness in complex scenarios.
Using graph neural networks allows for efficient multi-scale spatial feature extraction, enhancing pose estimation accuracy.
Dynamic fusion integrates learnable quality assessments with modal prior weights, improving system adaptability under diverse conditions.
Statistical analysis confirms significant improvements in pose estimation performance, suggesting a strong reliability of the approach.

Abstract

Accurate 3D human pose estimation has important application value in fields such as human–computer interaction, motion analysis, and medical rehabilitation. Traditional single-modal methods have significant limitations in complex environments. This paper proposes a dynamic multi-modal human pose estimation method that fuses visual sensors and millimeter-wave radar. First, we construct a radar point cloud processing framework based on graph neural networks. This framework maintains spatial topological relationships through a k-nearest neighbor graph structure and fuses five-dimensional feature information using a reflection intensity-weighted message passing mechanism. Second, we design a dynamic fusion strategy that combines basic quality assessment, learnable quality assessment, and modal prior weights to achieve quality-aware adaptive fusion. Systematic experiments on two datasets demonstrate the effectiveness of our approach. On the standard environment mRI dataset, our method achieves an MPJPE of 91.82 ± 41.81 mm. On the complex environment mmBody dataset, the average MPJPE is 62.47 ± 22.39 mm. Statistical analysis indicates that all improvements are significant ( p < 0 . 001 ). This method demonstrates excellent robustness in complex environments. • Combines visual sensors and radar to enhance robustness in 3D pose estimation. • Uses graph neural networks for multi-scale spatial and dynamic feature extraction. • Implements dynamic fusion with learnable weights for system stability in challenges. • Overcomes limitations of existing methods with improved radar-visual posture estimation.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Hu et al. (Tue,) studied this question.

synapsesocial.com/papers/69a7603dc6e9836116a2cc88 https://doi.org/https://doi.org/10.1016/j.measurement.2026.120687

Bookmark

View Full Paper