What question did this study set out to answer?

This research aims to develop a fast and quality-driven method for direct 3D scene stylization from sparse views.

June 20, 2026

GeoStyler: A Generalizable Geometry-aware Diffusion-based Approach for Direct 3D Gaussian Style Transfer

Key Points

This research aims to develop a fast and quality-driven method for direct 3D scene stylization from sparse views.
Introduced a diffusion model to generate geometrically consistent stylized 2D images.
Implemented a hybrid query formulation for self-attention to preserve 3D consistency and scene structure.
Developed a decoupled reconstruction network to convert 2D images into 3D representations.
GeoStyler achieves a significant speedup in stylization compared to previous methods.
Demonstrated state-of-the-art performance in stylization quality and multi-view consistency on benchmark datasets.
Extensive testing on RealEstate10K and ACID showcases superior results over prior techniques.

Abstract

Direct 3D scene stylization from sparse views remains a significant challenge, as existing optimization-based methods are prohibitively slow and require dense inputs to prevent geometric corruption. While recent direct methods accelerate this process, their rigid decoupling of a static geometry from appearance often leads to visual artifacts, where stylistic textures conflict with and distort the underlying scene structure. To address these limitations, we introduce GeoStyler, a direct framework that generates high-fidelity, multi-view consistent stylized 3D scenes in seconds. Our approach reformulates the conventional pipeline by first leveraging a diffusion model to generate a set of geometrically consistent stylized 2D images. The core of this stage is a novel hybrid query formulation for the self-attention mechanism. Specifically, cross-view geometric information is directly embedded into the query to enforce 3D consistency, while style information is independently injected via the key and value to preserve scene structure. This process is further stabilized by a geometrically-aware latent initialization that provides a coherent starting point for the denoising process. Subsequently, a decoupled reconstruction network lifts these 2D stylized images to 3D Gaussians. A geometry branch predicts a robust 3D scaffold from the original content images, while a parallel style branch predicts the final appearance from our generated stylized images, ensuring structural integrity is not compromised. Extensive experiments on large-scale benchmarks, including RealEstate10K and ACID, demonstrate that GeoStyler significantly outperforms prior arts in stylization quality and multi-view consistency, achieving state-of-the-art performance with a dramatic speedup. Our project page: https://huhuhuxiao. github.io/Geo-Styler/.

Bookmark

GeoStyler: A Generalizable Geometry-aware Diffusion-based Approach for Direct 3D Gaussian Style Transfer

Key Points

Abstract

Cite This Study