March 1, 2001

The Art of Data Augmentation

Key Points

Key points are not available for this paper at this time.

Abstract

The term data augmentation refers to methods for constructing iterative optimization or sampling algorithms via the introduction of unobserved data or latent variables. For deterministic algorithms,the method was popularizedin the general statistical community by the seminal article by Dempster, Laird, and Rubin on the EM algorithm for maximizing a likelihood function or, more generally, a posterior density. For stochastic algorithms, the method was popularized in the statistical literature by Tanner and Wong’s Data Augmentation algorithm for posteriorsampling and in the physics literatureby Swendsen and Wang’s algorithm for sampling from the Ising and Potts models and their generalizations; in the physics literature,the method of data augmentationis referred to as the method of auxiliary variables. Data augmentationschemes were used by Tanner and Wong to make simulation feasible and simple, while auxiliary variables were adopted by Swendsen and Wang to improve the speed of iterative simulation. In general,however, constructing data augmentation schemes that result in both simple and fast algorithms is a matter of art in that successful strategiesvary greatlywith the (observed-data) models being considered.After an overview of data augmentation/auxiliary variables and some recent developments in methods for constructing such

Bookmark

Cite This Study

Dyk et al. (Thu,) studied this question.

synapsesocial.com/papers/69d7b9e91f14cb2b27b8a8fa https://doi.org/https://doi.org/10.1198/10618600152418584

Bookmark