Abstract Single-cell multimodal datasets often exhibit heterogeneous and incomplete modality coverage, posing a challenge for data integration known as mosaic integration. Here, we present Palette, a flexible and interpretable computational framework for mosaic integration of single-cell multimodal data. Palette employs a variant of principal component analysis to disentangle technical noise from biological variation, and leverages the topological structure of the data to accommodate imbalanced modality composition. In systematic benchmarks, Palette consistently outperforms state-of-the-art mosaic integration algorithms, while robustly mixing datasets with various modality compositions. Applied to complex scenarios such as cross-condition and cross-species analyses, Palette preserves meaningful biological signals, enabling the identification of condition-specific cell states and rare subpopulations. We further demonstrate that Palette extends beyond single-cell mosaic integration to accommodate other challenging scenarios. Together, these results position Palette as a robust and versatile framework for harmonizing complex multimodal datasets and facilitating their joint analysis across diverse biological contexts.
Sheng et al. (Thu,) studied this question.