Abstract Motivation Missing data imputation remains a critical challenge in high-dimensional time-series data analysis, where traditional methods often struggle to capture complex nonlinear dependencies inherent in sequential data. Diffusion-based generative models have shown state-of-the-art performance by modeling the conditional distribution of missing values given observed data. However, these models typically rely on isotropic white noise during training, which can obscure important frequency-dependent correlations that are crucial for accurate imputation. Results To address the limitations of conventional imputation methods, we propose a novel approach called time-varying blue noise-based conditional score-based diffusion model (tBN-CSDI). By modulating the noise schedule according to the frequency characteristics of the data, tBN-CSDI improves the recovery of subtle, high-frequency temporal patterns that are often overlooked by existing techniques. Experimental results on both healthcare and single-cell RNA-seq datasets show that tBN-CSDI consistently outperforms existing imputation methods, achieving over a 30% reduction in imputation error under high data sparsity. These findings underscore tBN-CSDI’s potential as a robust and effective solution for imputing sparse and noisy time-series data. We further discuss its practical applications in improving change-point detection and gene regulatory network inference, demonstrating its broader utility in biomedical and biological research. Availability The computer code and data for the proposed method are available on GitHub: https://github.com/gbishop345/tBN-CSDI.
Bishop et al. (Thu,) studied this question.