What question did this study set out to answer?

The study aims to identify heterogeneous risk pathways to depressive and anxiety disorders using an innovative machine learning framework.

May 16, 2026Open Access

Heterogeneous pathways to depressive and anxiety disorders: A cluster-based predictive study in a nationwide longitudinal cohort

Key Points

The study aims to identify heterogeneous risk pathways to depressive and anxiety disorders using an innovative machine learning framework.
Analyzed cohort data from 15,897 Japanese adults using baseline demographic and behavioral variables.
Performed hierarchical clustering to derive data-driven subgroups, followed by Random Forest modeling for prediction.
Utilized SHapley Additive exPlanations (SHAP) to interpret predictors within each cluster.
Overall 6-month incidence of depressive and anxiety disorders was 6.23%.
Two high-risk subgroups identified: older adults with poor quality of life (12.9%) and working parents with work-family overload (29.8%).
Cluster-then-predict framework showed improved interpretation of risk factors, emphasizing loneliness and lifestyle disruptions.

Abstract

BACKGROUND: Early prediction of depressive and anxiety disorders is challenging due to substantial heterogeneity in risk pathways. Conventional machine-learning models trained on aggregated populations may obscure subgroup-specific mechanisms and limit interpretability for prevention. We evaluated whether a hybrid unsupervised-supervised framework can identify meaningful subgroups and yield more interpretable risk prediction. METHODS: We analyzed cohort data of 15,897 Japanese adults who completed baseline (August-September 2020) and 6-month follow-up (February-March 2021) surveys and did not screen positive for depressive and anxiety disorders at baseline (K6 score < 13). Using 169 baseline demographic, psychosocial, lifestyle, and behavioral variables, we performed hierarchical clustering to derive data-driven subgroups. Within each cluster, we trained Random Forest models to predict incident screened depressive and anxiety disorders at follow-up (K6 ≥ 13) and interpreted predictors using SHapley Additive exPlanations (SHAP). RESULTS: The overall 6-month incidence was 6.23%. A five-cluster solution revealed two high-risk subgroups: an older-adult profile with poor quality of life (12.9%) and a working-parent profile characterized by work-family overload (29.8%). Compared with a global model trained on the full sample, the cluster-then-predict framework showed broadly similar overall performance but performed better in the highest-risk subgroup and revealed more differentiated predictor profiles. Loneliness, health-related quality of life, happiness, and personality traits predominated in clusters with moderate adversity, whereas lifestyle disruption (sleep, diet, and irregular routines) characterized the high-risk late-life subgroup and alcohol dependence and work-family burden characterized the high-risk working-parent subgroup. CONCLUSIONS: Addressing risk-factor heterogeneity before prediction may enable more interpretable, context-tailored prevention strategies.

Bookmark

View Full Paper

Bookmark

View Full Paper

Heterogeneous pathways to depressive and anxiety disorders: A cluster-based predictive study in a nationwide longitudinal cohort

Key Points

Abstract

Cite This Study