What question did this study set out to answer?

The goal is to enhance the accuracy of head pose estimation amidst occlusion and rotation ambiguities.

April 24, 2026Open Access

Deep6DHead: A 6D Head Pose Estimation Method Based on Deep Feature Enhancement

Key Points

The goal is to enhance the accuracy of head pose estimation amidst occlusion and rotation ambiguities.
Developed a dual-branch network integrating RGB and depth information for feature fusion.
Utilized a Squeeze-and-Excitation module for adaptive weighting of depth features.
Implemented anatomical constraint loss to correct rotation deviations.
Achieved a lowest error of 2.05 on the roll axis, marking state-of-the-art performance.
Demonstrated a competitive mean absolute error (MAE) of 3.45 overall.
Showed significant improvements in robustness and accuracy under extreme head poses.

Abstract

To address the bottlenecks of accuracy in head pose estimation caused by occlusion and rotational representation ambiguities, we propose Deep6DHead, a 6-degree-of-freedom (6DoF) head pose estimation method based on deep feature enhancement. This method innovatively integrates RGB and depth information to construct a four-channel input and achieves feature fusion of RGB-D through a dual-branch network. First, a Squeeze-and-Excitation (SE) module adaptively weights the depth geometric features of key anatomical regions to achieve channel recalibration. Second, based on the 6DoF rotation representation framework, we introduce an anatomical constraint loss using the nasal bridge normal. This constraint corrects rotation deviations caused by noise by enforcing consistency in local geometric orientation. Finally, the model outputs the rotation matrix end-to-end for final pose estimation. Experiments on the 300W-LP, BIWI, and AFLW2000 datasets demonstrate that our method significantly improves robustness and accuracy, particularly under extreme head poses. Notably, it achieves state-of-the-art performance on the roll axis (lowest error: 2.05) and a competitive overall MAE of 3.45, providing an effective solution for head pose estimation in complex real-world scenarios including extreme viewing angles.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper