The dependence of neural radiance fields (NeRF) on accurate camera poses has emerged as a critical obstacle to their widespread real-world applications. While recent advances have demonstrated the potential for simultaneously addressing camera registration and scene reconstruction, these methods inherently rely on reasonable initialization derived from pose or scene priors and struggle with complex scenes involving large camera motions, particularly in unordered 360-degree scenes. In this work, we propose Zero-Pose-Prior NeRF to recover radiance fields from unposed and unordered image collections without any prior knowledge. Our key insight is to decompose this complex problem into smaller sub-problems, wherein the sub-problems' camera poses are initially estimated to provide self-bootstrapping priors for the global pose estimation, followed by a recursive registration and reconstruction. To achieve this, we first perform scene partitioning to establish a hierarchical structure that describes registration order from local to global. Thereafter, we devise a conditionally-decoupled positional encoding for NeRFs, which serves as the basic model for camera pose estimation and scene representation. Following this, we develop a recursive registration to recursively estimate the poses of local scenes and register them into a unified global pose space, ultimately enabling the reconstruction of the entire scene. Experiments on real-world scenes show that our approach outperforms the state-of-theart pose-free methods in terms of accurate camera poses and robust radiance field reconstruction, resulting in high-fidelity view synthesis.
Liu et al. (Thu,) studied this question.