To address the bottlenecks of accuracy in head pose estimation caused by occlusion and rotational representation ambiguities, we propose Deep6DHead, a 6-degree-of-freedom (6DoF) head pose estimation method based on deep feature enhancement. This method innovatively integrates RGB and depth information to construct a four-channel input and achieves feature fusion of RGB-D through a dual-branch network. First, a Squeeze-and-Excitation (SE) module adaptively weights the depth geometric features of key anatomical regions to achieve channel recalibration. Second, based on the 6DoF rotation representation framework, we introduce an anatomical constraint loss using the nasal bridge normal. This constraint corrects rotation deviations caused by noise by enforcing consistency in local geometric orientation. Finally, the model outputs the rotation matrix end-to-end for final pose estimation. Experiments on the 300W-LP, BIWI, and AFLW2000 datasets demonstrate that our method significantly improves robustness and accuracy, particularly under extreme head poses. Notably, it achieves state-of-the-art performance on the roll axis (lowest error: 2.05) and a competitive overall MAE of 3.45, providing an effective solution for head pose estimation in complex real-world scenarios including extreme viewing angles.
蒋发科 et al. (Wed,) studied this question.