Abstract Spatially resolved transcriptomics (SRT) has revealed the heterogeneity of cancer samples. Yet, the capability of computationally identifying the full hierarchy of spatial organization in modern large-scale SRT data remains limited. Although many computational methods have been developed to identify spatial domains and cell types, most of them do not consider the entire hierarchy of spatial organization or explicitly model the relationships among adjacent hierarchical layers. Only a few models (BASS, CytoCommunity, and SpaTopic) model the relationship between spatial domains and their cell type composition. However, BASS is not scalable to the latest SRT data, and both CytoCommunity and SpaTopic require user-provided cell-type annotations and assume that these annotations are accurate. Our goal is to infer the hierarchical spatial organization of tissues from large-scale high-dimensional SRT data, which requires uncovering hidden multi-scale structure in the measurements. To achieve this, we use a variational autoencoder (VAE) framework to learn low-dimensional latent variables that explain the observed gene expression, and we place biologically motivated prior distributions on these variables so that the inferred hierarchy remains close to reality. Concretely, our model consists of three coupled artificial neural networks: (1) multiple graph convolutional networks (GCNs) for modeling spatial domains with Gaussian Markov Random Field (GMRF) and Dirichlet prior, (2) a VAE for reconstructing gene expression and revealing cell types and cell-type-wise gene expression profiles, and (3) a feed-forward multi-layer perceptron for modeling cell types from spatial domains. Our model required 4 minutes for inference when applied to a Xenium data of 17K cells. We found that the domain annotations in breast cancer tissue separated the SRT data into three clear structural groups: invasive regions, DCIS type 1, and DCIS type 2. Furthermore, when we compared our predictions with the ground truth, the inferred cell types revealed malignant subpopulations that were not captured by the original labels, highlighting additional layers of tumor heterogeneity. Importantly, we also identified niches where immune cells and cancer cells were co-localized; these niches differed in their proportions of T cells, plasma B cells, pDCs, and monocytes/macrophages. Our method explicitly models the relationship between adjacent layers in the hierarchy, thus revealing the co-varying relationships among cell types and the cell type compositions within spatial domains, which cannot be directly identified by many previous methods that model cell types and spatial domains separately. We expect that this study can be used to identify distinct pathophysiological features in cancer tissues, which will help uncover biological heterogeneity closely associated with cancer progression. Citation Format: Jeongbin Park, Tingrui Zhang, Cong Ma, . Scalable cell type and spatial domain modeling using spatially informed topic inference of cancer niches abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 6837.
Park et al. (Fri,) studied this question.