Suicidal ideation is often assessed using a single self-report item in routine screening. We developed a model that combines machine learning with symptom-network analytics to infer an auxiliary signal relevant to suicidal ideation from routine depressive-symptom data. Adults from the National Health and Nutrition Examination Survey ( N = 44,922) were used to predict ideation (PHQ-9 item 9 ≥ 1) under three specifications: (1) PHQ-8 total score; (2) eight PHQ-8 items; and (3) those items plus 37 network features (8 centrality measures, 28 edges, and 1 density). Data was split 70/30 and trained using 10-fold cross-validation with fold-internal class balancing. Precision-recall area under the curve (PR AUC) was the primary metric. External validation used five independent datasets (total N = 808,023) with normalized PR AUC for comparison. Item-level models outperformed the PHQ-8 total-score baseline. With network features, XGBoost yielded the strongest performance. The optimized network-augmented XGBoost met the prespecified screening criterion (recall/sensitivity ≥0.80; specificity ≥0.50) and achieved PR AUC 0.37, improving on the total-score baseline (0.32). The analysis highlighted the importance of the centrality and severity of depressed mood and worthlessness/guilt, the overall density, and the edges linking depressed mood with worthlessness/guilt and sleep disturbance with psychomotor change. Across five external datasets, normalized PR AUCs ranged 0.32–0.51. Cross-sectional data limit causality. Thresholds prioritize first-line screening over confirmation. Integrating symptom-network features with machine learning enhanced interpretability while maintaining discrimination over item-only models and outperforming the PHQ-8 total-score baseline. The optimized model satisfies pragmatic screening criteria and is suited for first-line case finding. • Screening suicidal ideation from depression data with network-augmented ML • Network features atop PHQ-8 improve interpretability without performance loss. • Externally validated across five large cohorts
Kim et al. (Sun,) studied this question.