Key points are not available for this paper at this time.
This article investigates the problem of visual localization with image retrieval in the context of heritage structuring and documentation, leveraging an extensive image dataset collected during the restoration of Notre-Dame de Paris. To address the challenges of image retrieval for localization, we present a retrieval pipeline, called CIR4Loc, based on the composed image retrieval (CIR) paradigm, introducing textual modifiers that refine retrieval towards configurations more suited for localization. By bridging the gap between visual and spatial retrieval, this approach ensures the selection of images that are both visually relevant and spatially distributed to improve localization, and more precisely, camera pose estimation. We demonstrate the effectiveness of this proposal in a real-world heritage context, specifically the scientific site related to the restoration of Notre-Dame de Paris, emphasizing the necessity of retrieval strategies explicitly tailored for spatially aware localization.
Blettery et al. (Sat,) studied this question.