Advances in spatial transcriptomics have enabled high-resolution mapping of tissue architecture at the molecular level, yet integrating its multi-modal data remains challenging. Here, we present stGCL, a framework for accurate and robust integration of gene expression, spatial coordinates, and histological features. stGCL employs a histology-based Vision Transformer to extract morphological features and a multi-modal graph autoencoder with contrastive learning for cross-modal fusion. In addition, we introduce a spatial coordinate correction and registration strategy to support multi-slice integration. We demonstrate that stGCL reliably identifies spatial domains, integrates vertical and horizontal tissue slices, and highlight its generalizability across platforms and resolutions.
Yu et al. (Wed,) studied this question.