High-precision 3D reconstruction is essential for the digital preservation of cultural heritage, yet existing methods struggle with low-quality inputs and unstable geometric representations. We propose an end-to-end framework that combines a convolution-Transformer hybrid super-resolution network with an enhanced 3D Gaussian Splatting (3DGS) pipeline. The image enhancement stage employs multi-scale adaptive convolution, an efficient Transformer, and high-similarity attention to recover fine textures and global consistency. For reconstruction, we introduce Dense-SfM initialization, a voxel-decoder hybrid Gaussian representation, and a progressive multi-stage training strategy to improve geometric stability and efficiency. Experiments on the Cultural-Relics and Tanks and Temples datasets show that our method achieves superior PSNR, SSIM, and LPIPS scores while reducing training time and memory usage. The proposed approach offers a non-invasive and robust solution for cultural heritage digitization and provides a practical foundation for high-fidelity 3D reconstruction.
Jia et al. (Thu,) studied this question.