What type of study is this?

This is a Quantitative Study study.

October 20, 2025Open Access

Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models

Key Points

Advantage Weighted Matching significantly accelerates convergence in diffusion models, optimizing with lower variance.
AWM achieves up to a $24\times$ speedup over traditional methods, improving performance on GenEval and OCR benchmarks.
Leveraging the same objective as pretraining, AWM effectively raises the impact of high-reward samples in RL.
This approach aligns pretraining and RL, presenting a unified framework consistent with policy-gradient theory.

Abstract

Reinforcement Learning (RL) has emerged as a central paradigm for advancing Large Language Models (LLMs), where pre-training and RL post-training share the same log-likelihood formulation. In contrast, recent RL approaches for diffusion models, most notably Denoising Diffusion Policy Optimization (DDPO), optimize an objective different from the pretraining objectives--score/flow matching loss. In this work, we establish a novel theoretical analysis: DDPO is an implicit form of score/flow matching with noisy targets, which increases variance and slows convergence. Building on this analysis, we introduce Advantage Weighted Matching (AWM), a policy-gradient method for diffusion. It uses the same score/flow-matching loss as pretraining to obtain a lower-variance objective and reweights each sample by its advantage. In effect, AWM raises the influence of high-reward samples and suppresses low-reward ones while keeping the modeling objective identical to pretraining. This unifies pretraining and RL conceptually and practically, is consistent with policy-gradient theory, reduces variance, and yields faster convergence. This simple yet effective design yields substantial benefits: on GenEval, OCR, and PickScore benchmarks, AWM delivers up to a 24 speedup over Flow-GRPO (which builds on DDPO), when applied to Stable Diffusion 3. 5 Medium and FLUX, without compromising generation quality. Code is available at https: //github. com/scxue/advantageweightedₘatching.

Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models

Key Points

Abstract

Cite This Study

Also Consider

Also Consider