Key points are not available for this paper at this time.
To provide a reference for the initial diagnosis of clinical prostate cancer, we identified predictors and established a risk prediction model by analyzing a national Chinese prostate tumor dataset. Average value was used to interpolate the data from a prostate cancer dataset provided by the National Population Health Data Center of China. Factor screening was performed using the Kruskal-Wallis test and binary logistic regression. A multicollinearity analysis was performed on the variables. Cleaned data were divided into training and test datasets. Seven machine learning models and the 3 traditional clinical models were constructed. The top 3 models in terms of predictive efficacy were fused using a voting method. Accuracy, precision, F1 score, and area under the receiver operating characteristic curve metrics were used to evaluate the model. Feature importance analysis was used to determine the importance of the variables in each model. The study included 2213 cases: 1107 in the training set and 1106 in the test set. The prostate cancer model was established using back propagation neural network, random forest, and extreme gradient boosting algorithms and achieved an accuracy of 0.74, sensitivity of 0.78, F1 score of 0.77, and area under the curve of 0.80. The 5 key predictors of prostate cancer were the percentage of free prostate cancer-specific antigen, and the levels of inorganic phosphorus, apolipoprotein A1, free prostate cancer-specific antigen, and total prostate cancer-specific antigen. There was no high correlation among the variables. Our model based on fusing multiple models was good at assessing the risk of prostate cancer. This model could assist urologists in making appropriate treatment choices.
Geng et al. (Fri,) studied this question.