Objective Klebsiella pneumoniae liver abscess (KPLA) has a non-negligible risk of recurrence after treatment, imposing a substantial burden on affected patients and the global healthcare systems. To address this issue, we developed and validated a machine learning model that integrates patients’ clinical data and laboratory indicators for early risk prediction of KPLA. Methods This multicenter retrospective study included 829 KPLA patients from three tertiary hospitals (2016–2024). Data of 722 patients from the Affiliated Hospital of Chengde Medical University and Kailuan General Hospital were divided into a training set (n = 506) and an internal testing set (n = 216), and the data of the 107 patients from the First Hospital of Qinhuangdao were included in the quasi-external validation set. Twenty-four candidate variables were collected, and 9 key predictors were retained based on the results of univariate analysis, Least Absolute Shrinkage and Selection Operator regression, and the Boruta algorithm. After constructing seven machine learning models, with the logistic regression model (LM) as the baseline control, through accuracy and area under the curve (AUC), decision curve analysis, and calibration curve analysis, the extreme gradient boost method (XGBoost) was selected as the final prediction model. SHAP (SHapley Additive exPlanations) analyses were conducted to enhance model interpretability, and a web-based tool was developed for use in clinical practice. Results The hyperparameter-optimized XGBoost model showed optimal performance, with AUC values of 0.936 (95% CI: 0.914–0.959), 0.868 (95% CI: 0.799–0.938), and 0.904 (95% CI: 0.819–0.988) in the training, internal testing, and quasi-external validation sets, respectively. The intersection results of the three abovementioned feature selection approaches yielded 9 key predictors, including age, type 2 diabetes mellitus, malignant neoplasm, biliary disease, fibrinogen level, procalcitonin level, multiple abscesses, septic shock, and Sequential Organ Failure Assessment (SOFA)-2 score. The web-based tool enabled to assess individualized recurrence risk. Conclusions The XGBoost-based prediction model integrates clinical and laboratory indicators to accurately predict KPLA recurrence risk, with good generalizability and interpretability. The accompanying web-based tool provides a practical decision-making application for clinicians to identify high-risk patients early and implement personalized interventions.
Zhang et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: