Machine learning (ML) models are increasingly used in healthcare for risk prediction and decision support, but their performance often declines after deployment due to changes in patient populations, clinical practices, and data completeness. This study tackles three key challenges in reliable clinical ML: (1) temporal distribution shifts reducing generalizability, (2) underreporting and missing data biasing outcomes, and (3) sequential decision-making under cost and uncertainty. We propose an integrated framework comprising a temporal evaluation protocol to measure degradation over time, a domain adaptation method under missingness shift (DAMS) to enhance robustness with changing features, and a timing-aware reinforcement learning approach that considers when to intervene. Tested on seven large datasets, including SEER, MIMIC-IV, and CDC COVID-19, our methods improve calibration, robustness, and efficiency. For example, PU learning increased COVID-19 outcome prediction accuracy by 6–9%, DAMS reduced AUROC drop by almost 40%, and timing-aware RL achieved higher rewards with lower observation costs. These results show static evaluations underestimate deployment risk and that temporally aware, missingness-adaptive, and timing-sensitive methods enhance clinical decision-making. This is the first study to unify PU learning, DAMS, and timing-aware RL across real-world datasets, establishing a foundation for robust ML in healthcare.
Ali et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: