What question did this study set out to answer?

To develop a dual-pipeline framework that integrates tabular and image-based data for robust diabetes risk assessment.

February 28, 2026Open Access

Diabetes Risk Modeling through Tabular-to-Image Transformations and Ensemble Learning

Puntos clave

To develop a dual-pipeline framework that integrates tabular and image-based data for robust diabetes risk assessment.
Utilized CatBoost and multilayer perceptrons for raw tabular data processing.
Employed convolutional neural networks for tabular-to-image transformations.
Applied class imbalance handling techniques such as resampling and class weighting.
Implemented stacking ensemble strategies to improve prediction stability and generalization.
Achieved an accuracy of 0.8234 and a macro-F1 score of 0.6875 with stacking ensemble models.
Demonstrated that the five-model stacking ensemble achieved slightly higher accuracy (0.8327) but lower macro-F1.
Showed that tabular tree-based models and deep learning models remain essential for equitable predictions.

Resumen

Diabetes is a rapidly growing global health concern, and early detection and personalized risk assessment are critical for preventing severe complications. However, current predictive approaches often struggle with robustness, scalability, and class imbalance in clinical datasets. To address these challenges, this study proposes a dual-pipeline diabetes risk prediction framework that processes (i) raw tabular clinical data using CatBoost and multilayer perceptrons (MLPs), and (ii) tabular-to-image representations using Convolutional Neural Networks (CNNs) trained on images generated through the Novel Algorithm for Convolving Tabular Data (NCTD) and Image Generator for Tabular Data (IGTD). The two pipelines are integrated through multiple ensemble strategies, which combine complementary feature representa-tions to enhance predictive stability and generalization. Furthermore, class imbalance is systematically mitigated through resampling and class-weighting techniques, further contributing to improved model robustness. Our results indicate that the stacking ensemble achieved the most balanced performance (Accuracy = 0.8234, Macro-F1 = 0.6875). In contrast, a five-model stacking ensemble achieved a slightly higher accuracy (0.8327) at the cost of reduced macro-F1, indicating lower sensitivity to minority-class patterns. Overall, results show that although tabular-to-image transformation offers advantages in privacy preservation and feature abstraction, tabular tree-based models and deep learning models remain more reliable for equitable prediction. These findings highlight the value of integrating both pipelines to achieve a scalable and robust framework for diabetes risk assessment. • We develop a dual-pipeline framework combining tabular ML models (CatBoost, MLP) and image-based CNNs for accurate diabetes risk prediction. • We apply tabular-to-image transformation algorithms to enhance feature abstraction and data privacy. • We apply class imbalance handling through Edited Nearest Neighbor (ENN) resampling and class weighting to improve model fairness and minority class sensitivity. • We design ensemble models (random forest stacking and majority voting), achieving up to 83% accuracy with balanced performance across classes. • We demonstrate that integrating tabular and image-based learning provides a scalable and privacy-preserving approach for clinical risk assessment.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo