Background: Heterogeneous data integration remains a major challenge in intelligent information systems, particularly under missing-modality and cross-domain conditions. Existing multimodal fusion approaches often rely on complete datasets and weak alignment mechanisms, limiting their robustness and practical applicability. Objectives: This study aims to develop and evaluate a genomics-guided multimodal representation learning framework that enables robust heterogeneous data fusion, reliable cross-modal correspondence, and accurate prediction under incomplete-data conditions. Methods: We propose a multimodal learning architecture that models genomics as the primary biological anchor and learns conditional projections to imaging modalities, including multiparametric MRI and whole-slide histopathology (WSI). The framework formulates multimodal fusion as a genomics-guided contrastive learning problem, incorporates domain-specific optimization constraints, and learns a latent shared-state representation to support inference without requiring fully paired datasets. Evaluation was conducted using public datasets, including TCGA-PRAD and TCIA, across low-risk versus higher-risk/clinically significant prostate cancer (csPCa) discrimination, Gleason-based risk stratification, and clinically significant outcome prediction tasks under realistic multimodal and missing-modality scenarios. Results: In the adequately powered Genomics+WSI cohort (n = 486), the framework achieved an AUROC of 0.985 ± 0.005 for low-risk versus higher-risk/csPCa discrimination (p 0.90. Interpretability analysis revealed feature attributions aligned with domain-relevant genomic markers. Conclusions: The proposed framework provides a scalable and generalizable solution for heterogeneous multimodal data fusion, supporting reliable prediction, robustness to missing modalities, and applicability to complex information systems beyond the studied domain.
Abdullah et al. (Tue,) studied this question.