What question did this study set out to answer?

This research aims to develop a framework for completing gene expression profiles in single-cell RNA sequencing data, addressing sparsity issues.

March 19, 2026

SAD: Sparse-Aware Diffusion Model for Single-Cell Gene Expression Completion

Key Points

This research aims to develop a framework for completing gene expression profiles in single-cell RNA sequencing data, addressing sparsity issues.
Developed a diffusion-based framework called SAD for gene completion.
Focused on managing high missing rates and correcting sparsity bias.
Conducted extensive benchmarks against existing completion methods across multiple metrics.
SAD outperforms current imputation methods particularly under extreme scenarios with missing rates above 80%.
Generated reliable expression profiles with more than 30,000 genes from sparse data.

Abstract

Single-cell RNA sequencing (scRNA-seq) is entering an era of foundation models that accept the complete gene atlas as input, yet most current datasets cover only 10-12k genes and contain numerous technical zeros, severely limiting the generalization of these models in downstream tasks. To address this, we pioneer the gene-completion task for scRNA-seq and present SAD, a diffusion-based framework tailored to extremely sparse data, capable of completing genes and correcting sparsity bias under high missing rates. Unlike imputation or reconstruction methods that rely on the i.i.d. assumption, SAD's completion paradigm can generate gene entries originally absent from the expression profile, be aware of and rectify sparsity-distribution bias, and supply foundation models with consistent, reliable inputs of more than 30k genes. Extensive benchmarks show that SAD significantly outperforms existing methods across multiple completion metrics, particularly in extreme scenarios with missing rates above 80%. This provides a data foundation for reusing missing scRNA-seq information and for precision-medicine applications. The code is available at https://github.com/ZhangLab312/SAD.

Bookmark

SAD: Sparse-Aware Diffusion Model for Single-Cell Gene Expression Completion

Key Points

Abstract

Cite This Study