What question did this study set out to answer?

To enhance the efficiency of diffusion models while maintaining image quality during sampling.

May 15, 2026

Parallel Diffusion Solver via Residual Dirichlet Policy Optimization

Key Points

To enhance the efficiency of diffusion models while maintaining image quality during sampling.
Developed the Ensemble Parallel Direction solver (EPD-Solver) for improved gradient evaluations.
Implemented a two-stage optimization framework with a distillation-based approach and reinforcement learning fine-tuning.
Conducted extensive experiments on benchmark datasets to compare performance metrics.
Achieved state-of-the-art FID scores: 4.47 (CIFAR-10), 7.97 (FFHQ), 8.17 (ImageNet), 8.26 (LSUN Bedroom).
Enhanced human preference scores for text-to-image generation on Models like Stable Diffusion v1.5 and SD3-Medium.
Outperformed SD3-Medium's official 28-step baseline with only 20 steps, showing improved inference efficiency.

Abstract

Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature. Existing solver-based acceleration methods often face significant image quality degradation under a low-latency budget, primarily due to accumulated truncation errors arising from the inability to capture high-curvature trajectory segments. In this paper, we propose the Ensemble Parallel Direction solver (dubbed as EPD-Solver), a novel ODE solver that mitigates these errors by incorporating multiple parallel gradient evaluations in each step. Motivated by the geometric insight that sampling trajectories are largely confined to a low-dimensional manifold, EPD-Solver leverages the Mean Value Theorem for vector-valued functions to approximate the integral solution more accurately. Importantly, since the additional gradient computations are independent, they can be fully parallelized, preserving low-latency sampling nature. We introduce a two-stage optimization framework. Initially, EPD-Solver optimizes a small set of learnable parameters via a distillation-based approach. We further propose a parameter-efficient Reinforcement Learning (RL) fine-tuning scheme that reformulates the solver as a stochastic Dirichlet policy. Unlike traditional methods that fine-tune the massive backbone, our RL approach operates strictly within the low-dimensional solver space, effectively mitigating reward hacking while enhancing performance in complex text-to-image (T2I) generation tasks. In addition, our method is flexible and can serve as a plugin (EPD-Solverplugin) to improve existing ODE samplers. Extensive experiments demonstrate the effectiveness of EPD-Solver. On validation benchmarks, at the same latency level of 5 NFE, the distilled EPD-Solver achieves state-of-the-art FID scores of 4.47 on CIFAR-10, 7.97 on FFHQ, 8.17 on ImageNet, and 8.26 on LSUN Bedroom, surpassing existing learning-based solvers by a significant margin. On T2I benchmarks, our RL-tuned EPD-Solver significantly improves human preference scores on both Stable Diffusion v1.5 and SD3-Medium. Notably, it outperforms the official 28-step baseline of SD3-Medium with only 20 steps, effectively bridging the gap between inference efficiency and high-fidelity generation.

Bookmark

Parallel Diffusion Solver via Residual Dirichlet Policy Optimization

Key Points

Abstract

Cite This Study