Abstract Motivation Analysis workflows for highly multiplexed imaging technologies typically summarize each cell in terms of its post-segmentation mean expression, but additional cellular information can be quantified including cell morphology, sub-cellular expression patterns, and spatial cellular context, ultimately giving a multi-modal view of each cell. While deep learning models such as variational autoencoders are well established for other multi-modal single-cell assays, their ability to integrate these multiple views of a cell from highly multiplexed imaging data remains largely unknown. Results Here, we explore the abilities of multi-modal variational autoencoders to learn unified latent cellular representations from multiple views of each single-cell quantified from highly multiplexed imaging, including mean expression, morphology, sub-cellular protein co-localization, and spatial cellular context, while conditioning on technical and batch specific effects. We show that the integrated multi-modal latent space is often more associated with patient-specific clinical outcomes compared to a set of existing baselines. In addition, we perform ablation analyses ot understand which input views contribute to model performance, and explore the ability of these models to learn cellular representations that align with cellular phenotypes and enable integration across divergent datasets. Availability and implementation hmiVAE is implemented as a python package and is available at https://github.com/camlab-bioml/hmiVAE
Ayub et al. (Tue,) studied this question.