Abstract Objective . To develop a scalable AI solution for population-based cancer prevention that enables effective detection of malignant neoplasms (cancer) using the minimal necessary dataset from electronic health records (EHRs) – medical diagnosis and procedure codes. This system addresses the resource limitations of traditional cancer screening methods while maintaining high efficiency in patient risk stratification. Methods. The proposed method is based on a combination of gradient boosting with survival models. Over 700 predictors were constructed from raw EHR events, including sociodemographic characteristics, visit patterns, clinical history, and event frequencies by diagnosis groups. A key feature involves using population-based (Kaplan-Meier estimates) and individual (AFT model) risk characteristics as additional predictors for gradient boosting. Validation was conducted on data from over 2.5 million adult patients across 5 regions of the Russian Federation under the supervision of certified oncologists. Results. Our method achieves an Average Precision (AP) metric of 0.228, outperforming modern deep learning and large language model solutions with the best AP of 0.193. When forming a risk group comprising 1% of the population, the proposed method can identify 3.7–5.4 times more patients with cancer using the same screening volume. In a 12-month retrospective study, our method increased the number of detected cancer cases by +91% and expanded regional cancer coverage by 36 percentage points compared to current preventive health examination processes. The proposed AI-based system demonstrates high scalability: processing data for a city of one million takes less than three hours and requires no high-performance servers. Conclusions. This research presents a system for scalable population-based cancer prevention using exclusively medical diagnosis and procedure codes from EHRs. The system naturally integrates into existing medical workflows by directing at-risk patients to primary care physicians for decisions regarding oncologist referrals and additional examinations. Minimal data and computational resource requirements make the solution accessible for implementation across diverse healthcare systems, including remote regions with limited resources, opening new opportunities for enhancing population-based cancer prevention effectiveness.
Philonenko et al. (Mon,) studied this question.