March 3, 2026Open Access

stGCL: a versatile cross-modality fusion method based on multi-modal graph contrastive learning for spatial transcriptomics

Key Points

stGCL identifies spatial domains effectively, enhancing tissue architecture mapping from multi-modal data.
The system uses a histology-based Vision Transformer to capture essential morphological features accurately.
Application of a multi-modal graph autoencoder with contrastive learning allows for superior data integration.
The method showcases generalizability, adaptable across various platforms and resolutions, supporting wider usage.

Abstract

Advances in spatial transcriptomics have enabled high-resolution mapping of tissue architecture at the molecular level, yet integrating its multi-modal data remains challenging. Here, we present stGCL, a framework for accurate and robust integration of gene expression, spatial coordinates, and histological features. stGCL employs a histology-based Vision Transformer to extract morphological features and a multi-modal graph autoencoder with contrastive learning for cross-modal fusion. In addition, we introduce a spatial coordinate correction and registration strategy to support multi-slice integration. We demonstrate that stGCL reliably identifies spatial domains, integrates vertical and horizontal tissue slices, and highlight its generalizability across platforms and resolutions.

stGCL: a versatile cross-modality fusion method based on multi-modal graph contrastive learning for spatial transcriptomics

Key Points

Abstract

Cite This Study