Diabetes is a rapidly growing global health concern, and early detection and personalized risk assessment are critical for preventing severe complications. However, current predictive approaches often struggle with robustness, scalability, and class imbalance in clinical datasets. To address these challenges, this study proposes a dual-pipeline diabetes risk prediction framework that processes (i) raw tabular clinical data using CatBoost and multilayer perceptrons (MLPs), and (ii) tabular-to-image representations using Convolutional Neural Networks (CNNs) trained on images generated through the Novel Algorithm for Convolving Tabular Data (NCTD) and Image Generator for Tabular Data (IGTD). The two pipelines are integrated through multiple ensemble strategies, which combine complementary feature representa-tions to enhance predictive stability and generalization. Furthermore, class imbalance is systematically mitigated through resampling and class-weighting techniques, further contributing to improved model robustness. Our results indicate that the stacking ensemble achieved the most balanced performance (Accuracy = 0.8234, Macro-F1 = 0.6875). In contrast, a five-model stacking ensemble achieved a slightly higher accuracy (0.8327) at the cost of reduced macro-F1, indicating lower sensitivity to minority-class patterns. Overall, results show that although tabular-to-image transformation offers advantages in privacy preservation and feature abstraction, tabular tree-based models and deep learning models remain more reliable for equitable prediction. These findings highlight the value of integrating both pipelines to achieve a scalable and robust framework for diabetes risk assessment. • We develop a dual-pipeline framework combining tabular ML models (CatBoost, MLP) and image-based CNNs for accurate diabetes risk prediction. • We apply tabular-to-image transformation algorithms to enhance feature abstraction and data privacy. • We apply class imbalance handling through Edited Nearest Neighbor (ENN) resampling and class weighting to improve model fairness and minority class sensitivity. • We design ensemble models (random forest stacking and majority voting), achieving up to 83% accuracy with balanced performance across classes. • We demonstrate that integrating tabular and image-based learning provides a scalable and privacy-preserving approach for clinical risk assessment.
Wang et al. (Sun,) studied this question.