Introduction: Type 2 diabetes mellitus (T2DM) and cardiovascular diseases (CVDs) are major non-communicable disorders that significantly contribute to global morbidity and mortality, particularly in the North Indian population. This study aims to leverage machine learning (ML) to assess the risk of developing T2DM and CVDs using demographic, biochemical, and lifestyle parameters collected from an Indian cohort. Methods: A cross-sectional dataset comprising clinical and biochemical features was analyzed using supervised ML algorithms, including Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR), and Extreme Gradient Boosting (XGBoost). Feature normalization, correlation analysis, and hyperparameter tuning were performed to optimize model performance. Models were evaluated using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). Results: The ML models effectively classified individuals at high or low risk of T2DM and CVD. Ensemble-based models, such as Random Forest and XGBoost, achieved superior predictive performance compared to baseline algorithms, indicating their suitability for population-level screening and early risk identification. Discussion: The findings highlight the potential of ML in developing low-cost, data-driven decision- support tools for early identification of chronic disease risk. Population-specific modeling and feature interpretability are essential for improving generalizability and clinical translation of predictive systems. result: Diabetes Prediction: The AdaBoost model achieved an AU-ROC score of 86.2% using non-laboratory data, improving to 95.7% with laboratory data. CVD Prediction: The Weighted Ensemble Model attained an AU-ROC of 83.1% with non-laboratory data, which increased to 93.7% when laboratory data were included. Key Predictors: For diabetes, significant factors included age, waist circumference, sodium intake, and ethnicity, while CVD risk was primarily associated with cholesterol, triglycerides, and physical characteristics. The overlap in predictors suggests common underlying pathophysiological pathways. Conclusion: This study establishes an interpretable ML-based framework capable of predicting T2DM and CVD risk among North Indian individuals. The approach may support precisionprevention strategies and guide targeted public-health interventions.
Jyotsna et al. (Fri,) studied this question.