In this paper, we propose a novel algorithm that integrates diffusion models with reinforcement learning, called Diffusion-Q Synergy (DQS). The methodology formalizes an equivalence relationship between the iterative denoising process in diffusion models and the policy improvement mechanism in Markov Decision Processes. Central to this framework is a dual-learning mechanism: (1) a parametric Q-function is trained to evaluate noise prediction trajectories through temporal difference learning, effectively serving as a differentiable critic for action quality assessment; and (2) this learned Q-scoring function is then structurally integrated into the training objective of a conditional diffusion model, formulating a constrained optimization problem that simultaneously maximizes expected returns while minimizing policy deviation from behavioral priors. The algorithmic superiority of DQS stems from its hybrid architecture combining the i) diffusion policy cloning for stable behavior regularization and ii) adaptive noise rectification through Q-value-guided key denoising step correction, which is particularly effective for refining suboptimal action sequences, thereby guiding the entire diffusion trajectory toward policy optimality. Rigorous ablation studies across benchmark environments demonstrate statistically significant performance improvements (p<0. 01) over baseline methods in both computational efficiency and asymptotic policy quality. The implementation has been open-sourced at AOLIGOOD/DiffusionQSynergy, to facilitate reproducibility.
Building similarity graph...
Analyzing shared references across papers
Loading...
A Li
Xinghui Zhu
Haoyi Que
Applied Sciences
Hunan Agricultural University
Shenzhen Polytechnic
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Wed,) studied this question.
www.synapsesocial.com/papers/68d461c231b076d99fa6106a — DOI: https://doi.org/10.3390/app151810141