What question did this study set out to answer?

The study aims to enhance rear-view human tracking and re-identification for robotic visual sensing under challenging conditions.

May 7, 2026Open Access

Robust Rear-View Human Tracking for Robotic Visual Sensing: A Spatiotemporal Prediction and Multi-Modal Fusion Approach

Key Points

The study aims to enhance rear-view human tracking and re-identification for robotic visual sensing under challenging conditions.
Developed a lightweight tracking framework with spatiotemporal prediction and multi-modal feature fusion.
Utilized an ego-motion-aware Kalman prediction for maintaining temporal continuity during occlusions.
Employed a multi-factor descriptor combining color histograms and geometric constraints for target re-identification.
Achieved peak precision of 94.2% and a tracking success rate of 93.4%.
Demonstrated a 35% reduction in average tracking error, maintaining Center Location Error below 11 pixels.
Achieved rapid target re-identification response time of 72.83 ms during occlusions.

Abstract

Rear-view human tracking and re-identification remain critical challenges for robotic visual sensing in unmanned vehicles, particularly under adverse weather conditions and severe occlusion. Conventional deep learning models often suffer from feature contamination and trajectory drift under dynamic illumination. To overcome these bottlenecks, we propose a lightweight tracking framework driven by spatiotemporal prediction and multimodal feature fusion. Specifically, an ego-motion-aware Kalman prediction mechanism maintains temporal continuity during complete occlusions. Upon target reappearance, a multi-factor descriptor—fusing color histograms with geometric constraints—is employed within a dynamic Mahalanobis search region. This is coupled with a specular-reflection-penalized adaptive learning rate (ηk) that actively freezes template updates during severe environmental degradation conditions. Evaluated on a custom Mecanum-wheeled robot, the proposed method achieves a peak precision of 94.2% and a tracking success rate of 93.4%. Extensive experiments in extreme rainy night scenarios demonstrate a 35% reduction in average tracking error, maintaining a Center Location Error (CLE) below 11 pixels. Furthermore, the system achieves a rapid target re-identification response of 72.83 ms during occlusion phases. Ultimately, this framework delivers a highly robust and real-time solution for autonomous navigation in complex dynamic environments.

Robust Rear-View Human Tracking for Robotic Visual Sensing: A Spatiotemporal Prediction and Multi-Modal Fusion Approach

Key Points

Abstract

Cite This Study