Key points are not available for this paper at this time.
We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model's ODE trajectory. For efficient regression loss computation, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space, utilizing an ensemble of augmentations. Furthermore, we adapt a diffusion model to construct a multi-scale discriminator with a text alignment loss to build an effective conditional GAN-based formulation. E-LatentLPIPS converges more efficiently than many existing distillation methods, even accounting for dataset construction costs. We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models - DMD, SDXL-Turbo, and SDXL-Lightning - on the zero-shot COCO benchmark.
Building similarity graph...
Analyzing shared references across papers
Loading...
Minguk Kang
Chungbuk National University
Richard Zhang
Adobe Systems (United States)
Connelly Barnes
Adobe Systems (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Kang et al. (Thu,) studied this question.
synapsesocial.com/papers/68e6aec4b6db643587630f4e — DOI: https://doi.org/10.48550/arxiv.2405.05967