Synthesizing novel perspectives of complex scenes in high quality using sparse image sequences, especially for those without camera poses, is a challenging task. The key to enhancing accuracy in such scenarios lies in sufficient prior knowledge and accurate camera motion constraints. Therefore, we propose an end-to-end novel view synthesis network named BP-NeRF. It is capable of using sequences of sparse images captured in indoor and outdoor complex scenes to estimate camera motion trajectories and generate novel view images. Firstly, to address the issue of inaccurate prediction of depth map caused by insufficient overlapping features in sparse images, we designed the RDP-Net module to generate depth maps for sparse image sequences and calculate the depth accuracy of these maps, providing the network with a reliable depth prior. Secondly, to enhance the accuracy of camera pose estimation, we construct a loss function based on the geometric consistency of 2D and 3D feature variations between frames, improving the accuracy and robustness of the network's estimations. We conducted experimental evaluations on the LLFF and Tanks datasets, and the results show that, compared to the current mainstream methods, BP-NeRF can generate more accurate novel views without camera poses.
Qiu et al. (Thu,) studied this question.