Spatial transcriptomics integrates morphological information from pathology images with gene expression, providing high-resolution spatial gene expression profiles while preserving tissue architectures in a cost-effective manner. However, the inherent heterogeneity between images and gene expression data, coupled with sparse gene expression distribution, poses significant challenges for accurate and unbiased prediction models. To address these issues, we propose Img2Gene, a debiased framework designed to predict gene expression levels from whole slide images by incorporating biological context. Specifically, we integrate causal analysis into the gene expression prediction task to mitigate data sparsity and achieve unbiased predictions. Furthermore, we employ gene set enrichment analysis to identify highly associated pathway information as biological context and introduce a cross-modal coherence loss to align data from different modalities, fostering enhanced interplay among diverse features and achieving improved accuracy of gene expression prediction. Extensive experiments conducted on four public datasets demonstrate that our method achieves state-of-the-art performance. The pathway data and source code are available at https://github.com/coffeeNtv/Img2Gene.
Zhang et al. (Thu,) studied this question.