Hepatitis B virus (HBV) infection remains a public health concern in China, especially among middle-aged and older adults. Accurate prediction models can assist in early detection and targeted interventions. However, predicting HBV infection using population-level data is challenging due to its relatively low prevalence, which creates highly imbalanced datasets that undermine the performance of traditional predictive models. This study aimed to develop a machine learning model to predict HBV infection using an ensemble approach designed to handle class imbalance. Our study conducted a cross-sectional analysis of participants aged 45 and above from the China Health and Retirement Longitudinal Study 2011. To address the class imbalance in HBV infection, the synthetic minority oversampling technique (SMOTE) was employed. We developed machine learning models, including logistic regression (LR), support vector machine (SVM), random forest (RF), gradient boosting machine (GBM), adaptive boosting classifier (AdaBoost), extreme gradient boosting (XGBoost), and stacking ensemble model (SEM), to predict HBV infection and identify predictors. Model performance was evaluated using area under the receiver operating characteristic curve (AUC), area under the precision-recall curve (AP), accuracy, sensitivity, specificity, precision, F1 score, and Brier score. Shapley additive explanation (SHAP) analysis was applied to interpret each predictor’s individual contribution and direction to the model’s output. In the final analysis, 8,954 participants were included, among whom 0.64% self-reported HBV infection, confirming highly imbalanced data. The SEM model achieved balanced performance (AUC: 87.93%, AP: 18.85%, F1: 15.66%, Brier score: 0.03). Machine learning models also identified several new important predictors, including birth season, depression, and knee height. An ensemble machine learning model incorporating techniques to manage data imbalance can accurately predict HBV infection in Chinese middle-aged and older adults. The novel predictors we identified offer new insights into HBV risk in ageing populations and support the development of more targeted prevention strategies.
Sun et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: