Electronic Health Records (EHRs) continuously monitor patients’ health status in Intensive Care Units (ICUs), capturing irregular numerical time-series data and unstructured clinical text. While existing studies primarily focus on handling modality irregularities, they often overlook the complex intra- and inter-sequence interactions as well as the dependencies between short-term and long-term features. Moreover, clinical notes are typically semantically sparse and structurally noisy, making them difficult to interpret. To address these challenges, we propose a novel multimodal predictive model. For irregular numerical time-series data, we design a cross-view multi-scale framework that integrates cross-attention mechanisms with multi-scale convolutions. This enables dynamic modeling of diverse temporal embeddings while precisely capturing intrinsic inter-variable interactions and cross-temporal dependencies, all with reduced computational complexity. For clinical text, we adopt a retrieval-augmented technique that leverages external medical knowledge graphs (KGs) and large language models (LLMs) to enrich text representations related to medical codes. These enhanced embeddings are then fused with clinical notes via a gated mechanism, effectively alleviating semantic sparsity. We validate the effectiveness of the proposed approach on two critical clinical prediction tasks. Experimental results show maximum relative F1 score improvements of 3.3%, 6.0%, and 3.4% for MISTS, clinical notes, and multimodal fusion tasks, respectively, demonstrating our method’s excellent medical predictive capability.
Wang et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: