Background: Heart disease prediction is a critical task in clinical decision support, particularly in settings with high physician workloads. Interpretable and computationally efficient models are needed to facilitate transparent and practical implementation in healthcare environments. Methods: This retrospective secondary analysis utilized a publicly available multi-country heart disease dataset ( N = 918) derived from the UCI Machine Learning Repository. The primary outcome was binary heart disease status. A C4.5-based decision tree model with information gain-based pruning was developed using predefined predictors selected by gain ratio. Internal validation was performed using 5-fold cross-validation. External validation was conducted using a leave-one-country-out strategy to assess generalizability across national cohorts. Model performance was evaluated using discrimination metrics (accuracy, precision, recall, F1-score, and Area Under the Curve AUC), calibration (Brier score and calibration plots), and computational complexity (training and inference time). Comparative analyses were conducted against K-Nearest Neighbors (KNN), random forest, and Multilayer Perceptron (MLP) models using consistent parameter settings. Subgroup analyses by age and sex were also performed. Results: In internal validation, the decision tree achieved an accuracy of 0.8366 ± 0.0329 and an F1-score of 0.8319 ± 0.0344, with an AUC of 0.8981 ± 0.0277 and a Brier score of 0.1206 ± 0.0182. The model demonstrated low computational cost (training time: 0.0028 ± 0.0015 seconds). External validation revealed performance variability across countries, indicating sensitivity to distribution shifts. Subgroup analyses showed generally consistent performance across age and sex strata, although instability was observed in data-scarce subgroups. Conclusion: The proposed C4.5-based model provides interpretable rule-based predictions with competitive discrimination, acceptable calibration, and low computational complexity. While performance varies across national cohorts, the model demonstrates potential as a transparent and resource-efficient prototype clinical decision support tool, warranting further prospective validation.
Chokphoemphun et al. (Fri,) studied this question.