What type of study is this?

This is a Experimental Study study.

October 8, 2025Open Access

DiMO: Distilling Masked Diffusion Models into One-step Generator

YZYuanzhi ZhuSecond People’s Hospital of Yibin XWXi WangCentre National de la Recherche Scientifique SLStéphane LathuilièreCentre National de la Recherche Scientifique

Key Points

Di[M]O achieves competitive performance in image generation, significantly reducing inference time.
It successfully distills masked diffusion models into a one-step generator through innovative strategies.
The approach tackles intractability of intermediate-step information and improves initial distribution entropy.
This method is ground-breaking for generative modeling, especially in text-to-image applications.

Abstract

Masked Diffusion Models (MDMs) have emerged as a powerful generative modeling technique. Despite their remarkable results, they typically suffer from slow inference with several steps. In this paper, we propose DiMO, a novel approach that distills masked diffusion models into a one-step generator. DiMO addresses two key challenges: (1) the intractability of using intermediate-step information for one-step generation, which we solve through token-level distribution matching that optimizes model output logits by an 'on-policy framework' with the help of an auxiliary model; and (2) the lack of entropy in the initial distribution, which we address through a token initialization strategy that injects randomness while maintaining similarity to teacher training distribution. We show DiMO's effectiveness on both class-conditional and text-conditional image generation, impressively achieving performance competitive to multi-step teacher outputs while drastically reducing inference time. To our knowledge, we are the first to successfully achieve one-step distillation of masked diffusion models and the first to apply discrete distillation to text-to-image generation, opening new paths for efficient generative modeling.

Ask AI

Helpful

Bookmark

View Full Paper