Abstract This survey examines recent non-generative applications of diffusion models to visual data, systematically reviewing research papers from top-tier Artificial Intelligence / Machine Learning conferences, leading journals, and arXiv preprints. Our analysis focuses on a discriminative utilization of vision diffusion models (DMs) beyond image generation. We provide a novel taxonomy that divides existing applications of DMs into four groups: content detection, action understanding, spatiotemporal view estimation, and representation learning, with further more detailed division within each category. Our systematic analysis reveals that diffusion models achieve superior performance in uncertainty-critical discriminative tasks including pose estimation, anomaly detection, semantic correspondence, and depth estimation, but universally face computational overhead challenges with 10-100 times slower inference times than their discriminative alternatives. The survey identifies key advantages of diffusion-based data analysis, including a better handling of ambiguous ground truth, inherent uncertainty quantification, and rich representations of foundational characteristics. We also highlight promising hybrid approaches that combine diffusion and discriminative methods that maintain high performance while addressing computational constraints. The paper provides machine learning practitioners with systematic guidelines for leveraging diffusion models in visual analysis tasks and identifies critical research gaps in efficiency optimization and cross-domain generalization.
Olechno et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: