Online cultural heritage presentations are transitioning from traditional media to immersive 3D displays, yet high-fidelity on-site 3D modeling faces copyright, security, and preservation constraints. This study proposes a deep learning-based 3D reconstruction method using existing panoramic images, demonstrated through Dunhuang’s Mogao Caves. Our automated workflow employs pre-trained depth estimation networks to generate 3D models, substantially reducing costs and technical barriers. Four techniques—Panoramic Display (M1), Box Projection (M2), Photogrammetry (M3), and Computer Vision (M4)—were integrated into a unified VR platform. User experiments (N = 33) combining spatial behavior tracking and questionnaires evaluated key performance metrics. Results demonstrate that the computer vision approach optimally balances spatial fidelity, cost-efficiency, and accessibility, offering a scalable solution for resource-limited digital heritage projects.
Xu et al. (Thu,) studied this question.