Early detection of disease is a cornerstone for improving patient outcomes, reducing costs, and enabling preventative interventions. Traditional predictive models often rely on a single type of data (e.g., imaging, clinical labs, or genomics). However, human health is inherently multimodal, involving a variety of data sources such as electronic health records (EHRs), medical imaging, wearable sensor data, genomics, and clinical notes. Integrating these heterogeneous modalities into unified predictive modelsoffers a path to richer, more accurate, and more robust disease detection. In this paper, we present a comprehensive survey of methods in multimodal predictive modeling for early disease detection, illustrate architectural patterns and fusion strategies, discuss challenges (e.g. missing modalities, interpretability, generalizability, bias), and highlight promising directions. We also propose a reference architecture for developing such systems and suggest evaluation best practices.
Hayat et al. (Tue,) studied this question.