Abstract Diffusion models are powerful generative frameworks for producing high‐quality images by denoising latent variables from random noise. However, training with likelihood‐based objectives, such as denoising score matching, can lead to locally oversmoothed high‐frequency details, including fine textures and sharp edges, thereby limiting perceptual fidelity and structural detail. Adversarial training with GANs enhances sharpness but typically requires additional discriminator networks, increasing computational costs and destabilizing training. To this end, we propose Latent Diffusion Generative Adversarial Networks (LD‐GAN), a novel framework that seamlessly integrates adversarial learning into diffusion models without modifying their original pipeline. LD‐GAN leverages the pretrained variational autoencoder (VAE) in latent diffusion models as an energy‐based discriminator, enabling adversarial training without extra parameters and preserving the structured latent priors learned from large datasets. We also introduce a structural consistency energy that aligns encoder and decoder feature representations, thereby enhancing perceptual quality and compatibility with the pretrained latent space. Extensive experiments demonstrate that LD‐GAN significantly improves sample fidelity, perceptual sharpness, and diversity over state‐of‐the‐art baseline methods across various generation tasks while ensuring efficient training dynamics.
Building similarity graph...
Analyzing shared references across papers
Loading...
U-Chae Jun
Sookmyung Women's University
Jaeeun Ko
Jiwoo Kang
Sookmyung Women's University
Computer Graphics Forum
Sookmyung Women's University
Building similarity graph...
Analyzing shared references across papers
Loading...
Jun et al. (Wed,) studied this question.
synapsesocial.com/papers/69e1d0165cdc762e9d859277 — DOI: https://doi.org/10.1111/cgf.70409