Diffusion models have achieved remarkable success in image synthesis but remain computationally expensive due to iterative denoising in high-dimensional pixel space. Latent Diffusion Models (LDMs) alleviate this cost through compressed neural representations, yet often sacrifice high-frequency spatial detail during latent compression. In this work, we introduce a diffusion framework based on 2D Gaussian Splatting, replacing dense pixel-space representations with structured Gaussian primitives. Our approach employs an encoder-diffusion-renderer pipeline, where input images are first mapped into a compact Gaussian latent space parameterized by spatial location, scale, opacity, and color attributes. Diffusion is then performed directly over these Gaussian latents, enabling efficient generation while preserving fine-grained spatial structure. The denoised Gaussian representations are subsequently rendered back into the image domain using differentiable splatting. By operating on compact geometric primitives rather than dense latent tensors, our framework significantly reduces computational overhead while maintaining visual fidelity.
Archan Ghosh (Sat,) studied this question.