ABSTRACT Camera pose estimation is a significant application within the field of computer vision. However, many deep learning‐based camera pose estimation models feature substantial parameter scales and typically require significant computational resources during training. Many models experience a decline in accuracy when operating in complex environments such as those featuring dense textures or motion blur. To address this issue, this study proposes LMNNet, a lightweight camera pose regression network for complex scenes. To address the issues of substantial network parameters and high computational memory consumption, the model employs lightweight DGhost convolutions within its backbone network as a replacement for traditional standard convolutions, significantly reducing both the number of network parameters and memory usage. In addition, to enhance the robustness of the model in complex scene, the multi‐head dynamic sparse attention mechanism (MHDSA) is integrated into the encoder part. This mechanism improves the network's ability to focus on key areas during feature extraction by dynamically allocating feature weights. To capture more global and edge feature information, this study innovatively proposes a non‐local feature fusion (NLFF) module. This module significantly enhances the accuracy of camera pose estimation through feature interaction and a multi‐scale feature information fusion mechanism. Finally, LMNNet was evaluated on the 7Scenes indoor dataset and the Cambridge Landmarks outdoor dataset. Research findings indicate that LMNNet achieves more precise camera pose estimation in complex scene, whilst significantly reducing the number of parameters and computational memory requirements compared to other absolute pose regression networks.
Wang et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: