Diffusion models are a class of learned generative models that operate by transforming samples of Gaussian noise into samples from a target distribution by repeated denoising. One way of formulating diffusion models is through the probability flow ordinary differential equation (PF ODE), which characterizes the trajectory from noise to clean data by a deterministic ODE. Notably, under this formulation, the probability density of a generated sample can be computed exactly using the instantaneous change of variables (ICOV) formula. This capability raises the question of whether diffusion models can be used in applications where probability density evaluation is crucial, such as rare event estimation. In this work we experimentally test whether diffusion models might be suitable for such tasks. We select two simple distributions---a Gaussian and a Brownian motion---which are each in two dimensions and are each made conditional on samples lying outside a forbidden region near the origin. For evaluation purposes, the conditional Gaussian's distribution can be evaluated analytically and the conditional Brownian motion's can be written in terms of a non-elementary integral which we can evaluate numerically with very high accuracy. We qualitatively and quantitatively examine the effect of a number of choices on the accuracy of the approximation extracted from trained diffusion models for each of the two distributions: the ODE integrator for solving the PF and ICOV ODEs to extract sample and probability estimates; the number of samples used in both histogram- and ICOV-based density estimates; various parameters of the distributions and samples, such as the size of the forbidden region near the origin; and the training effort used to create the models. We draw two conclusions. First, it is relatively easy to choose algorithms and parameters such that training effort is the dominant driver of accuracy. Second, that these models cannot achieve sufficient accuracy for use in tasks such as rare event simulation even for these low-dimensional toy problems (at least for the model size that we used).
Justice Sefas (Thu,) studied this question.