Accurate pose estimation is crucial for reliable docking and recovery of Autonomous Underwater Vehicles (AUVs). Traditional visual-based pose estimation methods face inherent challenges: monocular methods often struggle with depth inference, and conventional Perspective-n-Point (PnP) algorithms exhibit accuracy degradation at large viewing angles and limited noise resistance, while binocular systems involve higher computational complexity. This paper proposes a two-stage algorithm that combines iterative PnP initialization with binocular constraint optimization. By using iterative PnP to establish reliable initial estimates, the approach avoids convergence difficulties of direct binocular optimization, while the subsequent binocular refinement leverages stereo geometric constraints to enhance accuracy. Comprehensive evaluation through simulation, land-based experiments, and underwater validation demonstrates consistent performance improvements over conventional geometric methods. In simulation experiments across −60° to 60° yaw angles, the method achieves 93.2% and 28.6% improvements in translation and rotation accuracy respectively compared to iterative PnP. Land-based validation confirms 32.7% average rotation error reduction, while underwater experiments demonstrate 76.5% average distance error reduction under real optical conditions including refraction and light attenuation. The method maintains real-time processing capability (2.16 ms per frame), offering a practical solution for AUV pose estimation in docking applications.
Wang et al. (Mon,) studied this question.