Reconstructing large-scale 3D scenes remains challenging due to the need to balance photorealistic quality, real-time rendering, and compact storage. Recent progress in 3D Gaussian Splatting (3DGS) has achieved impressive fidelity and speed, yet its large-scale application suffers from excessive primitive counts, leading to prohibitive storage and rendering costs. To overcome this inefficiency, we introduce a novel semantic-guided hybrid representation that unifies textured meshes and 3D Gaussians in a differentiable framework. The key idea is to leverage meshes for geometrically regular regions such as roads and building facades, while reserving Gaussians for fine, complex details like vegetation. Our method is realized through three key technical contributions. First, we develop a semantic-guided adaptive modeling pipeline that fuses multi-view segmentation onto the scene mesh to robustly partition the scene and prune redundant Gaussians. Second, we introduce a high-performance CUDA-based hybrid renderer that seamlessly combines mesh rasterization with Gaussian splatting, enabling correct occlusion handling and joint optimization of both representations. Finally, we propose a mesh-guided sampling strategy that adaptively adds Gaussians to recover fine details in under-reconstructed areas. Extensive experiments on diverse large-scale datasets demonstrate that our approach significantly reduces storage requirements and accelerates rendering performance while maintaining comparable or superior visual quality.
李虎森 et al. (Thu,) studied this question.