Systematic review and random-effects meta-analysis of PHQ-9 factor structure, measurement invariance, and internal consistency. # Factor structure and measurement invariance of the Patient Health Questionnaire-9 (PHQ-9): a systematic review and random-effects meta-analysis ## Background and rationale The Patient Health Questionnaire-9 (PHQ-9) is a widely used 9-item self-report depression instrument. Despite its ubiquity in primary care, epidemiologic surveys, digital health, and implementation research, its latent factor structure remains contested across one-factor (1F) general-depression, two-factor (2F) correlated cognitive-affective vs. somatic, and hierarchical or bifactor specifications. These disagreements affect score interpretation, the defensibility of subscale reporting, measurement invariance claims, and cross-population/language comparability of PHQ-9 totals. No recent transparent meta-analysis pools field-wide fit and reliability statistics or formally tests whether different model classes diverge. ## Objectives We aim to (1) pool five widely reported fit and reliability indices (CFI, TLI, RMSEA, SRMR, Cronbach's α) for the PHQ-9 from a conservatively curated structural-validation evidence base, (2) test whether one-factor and two-factor specifications diverge using Cochran's Q-between and a Wald-equivalent meta-regression with model class as a binary moderator, and (3) apply a COSMIN-inspired abstract-level methodological-quality screen with a sensitivity re-pool that excludes Inadequate-rated studies. ## Eligibility criteria (PICOS) - **Population: ** Adult populations (general or clinical) administered the standard 9-item PHQ. - **Intervention/Exposure: ** Not applicable (psychometric / structural validity review). - **Comparator: ** Across model classes (1F vs. 2F correlated vs. hierarchical/bifactor). - **Outcomes: ** Confirmatory factor analysis fit indices (CFI, TLI, RMSEA, SRMR) and internal consistency (Cronbach's α). - **Study designs: ** Original empirical structural-validation studies reporting factor-analytic or psychometric evidence on the standard 9-item PHQ. - **Time horizon: ** Records published from January 1, 2010 onward. - **Language: ** English-language reports (with Korean and Spanish reports retained where verifiable abstracts/full texts are obtainable). ## Search strategy **v1. 5 baseline (already executed): ** Single-database PubMed search (Jan 1, 2010 to April 17, 2026) using free-text terms targeting "PHQ-9", "Patient Health Questionnaire-9", and factor-analytic/psychometric concepts; no MeSH-controlled vocabulary; no second reviewer; no grey-literature search; no trial-registry query. Yield: 584 records. **v2 multi-database expansion (planned, library-pending): ** PRISMA-S-compliant search of Embase, OVID PsycINFO, CINAHL (EBSCO), Education Source, APA PsycNet, Cochrane CENTRAL, LILACS, Scopus, and regional indexes (KoreaMed, SciELO). Library-mediated database access requested (2026-04-18 ~ 2026-04-27). **Phase 1·5 supplementary scoping cross-check (already executed): ** Preprint-repository searches across bioRxiv/medRxiv and supplementary academic indexes. Yielded 18 additional candidates → 9 PubMed-verified PMID lock-ins (Kliem 2024 PMID 39726913; Doi 2018 PMID 30024876; Tibubos/Beutel 2021 PMID 33952234; Rosario 2023 PMID 36865076; Rahman/Mehareen 2022 PMID 35675375; Shin 2020 PMID 32354339; Vu 2022 PMID 35990070; Hall 2020 PMID 33248710 paywall, ILL pending; Lamela 2020 PMID 32697702 paywall, ILL pending). ## Screening and data extraction **v1. 5 baseline (executed): ** Single-reviewer screening at title/abstract level (no inter-rater reliability statistics). Round 1 (N=584): INCLUDE 369 / BORDERLINE 121 / EXCLUDE 94. Round 2 (N=369): INCLUDEMETA 47 / INCLUDESEED 16 / INCLUDEQUAL 157 / EXCLUDE 149. Final extraction set: 67 records (post seed-merge). Manuscript-stage curation (subjective, manual judgment based on direct relevance to PHQ-9 latent dimensional structure) retained 33 PHQ-9-focused structural-validation studies (the curated PMID list). **v2 (planned): ** Dual-reviewer formal screening with full inter-rater reliability statistics (Cohen's κ) ; discrepancies resolved by consensus or third-reviewer adjudication. **Mega-sample exclusion rule: ** Two studies with N>100, 000 (Flores-Cohaila 2026, PMID 41619940, N=318, 681 administrative-registry; Nouwen 2021, PMID 34139403, N=159, 801 analyst-summed multi-cohort aggregate) were excluded from the primary REML pool because the vᵢ = 1/N sampling-variance approximation has no coherent sampling interpretation in either case. **Important transparency note: ** This rule was finalised during analysis rather than pre-specified before the search, and should be read as a transparent post-hoc analytic decision. ## Quantitative synthesis Studies with N ≥ 17 and at least one extractable fit or reliability statistic are eligible for pooling (the N ≥ 17 floor is a pragmatic cutoff chosen to keep the Fisher-z α transformation numerically stable at vᵢ = 1/ (N-3) ; flagged as a reviewer-noted arbitrary threshold rather than a principled power analysis). Random-effects meta-analyses are fitted using REML estimation: - **CFI and TLI: ** direct scale; sampling variance approximated as vᵢ = 1/N. - **RMSEA and SRMR: ** direct scale; vᵢ = 1/N. - **Cronbach's α: ** Fisher-z transform; vᵢ = 1/ (N−3) ; back-transformed via tanh. τ² estimated via REML. Pooled 95% CIs computed using standard normal-approximation Wald intervals. Heterogeneity quantified using Cochran's Q and I². The conventional acceptable ranges used for narrative interpretation (CFI/TLI ≥0·95, RMSEA ≤0·06, SRMR ≤0·08) follow the widely cited Hu (ii) Cochran's Q-between with a Wald-equivalent fixed-effect meta-regression treating model class as a binary moderator (1F=0, 2F=1; α analysed on Fisher-z scale) ; (iii) leave-one-out sensitivity recomputation for each of the five primary indices. ## Risk of bias **v1. 5 baseline: ** No full-text-based risk-of-bias appraisal (COSMIN full checklist, ROBIS, or AXIS). Four abstract-level COSMIN-inspired signals (sample-size adequacy, model-structure disclosure, fit-index reporting, reliability reporting) applied to all 33 curated studies, with a sensitivity re-pool excluding Inadequate-rated studies. **v2 (planned): ** Full COSMIN Risk of Bias checklist for measurement properties, applied to full-text records by two independent reviewers. ## Pre-existing analytic baseline (v1. 5, already completed) This registration documents an **upgrade** of an existing v1. 5 rapid review to a fully PRISMA-S-compliant, multi-database, dual-reviewer systematic review (v2). The v1. 5 rapid review is preserved as a separate transparency artifact and will be cited in the v2 manuscript. **All v1. 5 numerical results below are pre-existing and fixed at the time of this registration: ** | Index | Pooled estimate (95% CI) | k | I² | |---|---|---|---| | CFI | 0·966 (0·949 to 0·983) | 9 | 87·0% | | TLI | 0·957 (0·925 to 0·989) | 4 | 67·0% | | RMSEA | 0·066 (0·050 to 0·081) | 11 | 82·5% | | SRMR | 0·041 (0·031 to 0·051) | 5 | 0·0% | | Cronbach's α | 0·834 (0·742 to 0·895) | 6 | 99·4% | A Phase 2 pre-registration simulation with 9 additional PubMed-verified candidates yielded a TLI shift of −0·021 (RMSEA +0·007, α +0·017), driven primarily by the Kliem 2024 outlier (RMSEA = 0·17). The qualitative conclusion that "the PHQ-9 is structurally credible across adult populations" is robust across the v1. 5 baseline and the Phase 2 simulation pool. ## Anticipated timeline - **2026-04-30: ** OSF Open-Ended Registration submitted (this registration). - **2026-05-01 onward: ** Library-mediated multi-database search execution (Embase, PsycINFO, CINAHL, Cochrane CENTRAL, LILACS, Scopus, KoreaMed, SciELO). - **2026-05–06: ** Dual-reviewer screening (Cohen's κ ≥ 0·70 target). - **2026-06–07: ** Full-text COSMIN Risk of Bias checklist application. - **2026-07-31: ** Anticipated v2 manuscript submission to peer-reviewed journal (target: Lancet Psychiatry; alternative targets: BMJ Open, Journal of Affective Disorders). ## Funding and conflicts of interest **Funding: ** None. This systematic review is conducted without dedicated external funding. The corresponding author's institutional affiliation provides standard library and database access. **Conflicts of interest: ** The reviewers declare no competing interests. None of the authors of the curated primary studies is a member of the review team. ## Data availability Data and supporting materials will be made available on OSF following peer-review submission. ## Provisional author list (transparency) The full provisional author list comprises seven contributors, in the following order: (1) Jung Moses Koo (Oxford Department of International Development, Wolfson College, Oxford University, UK) ; (2) Yong-Tae Kwak (Department of Neurology, Yong-In Hyo-Ja Hospital, Gyunggi-Do, South Korea) ; (3) Sun-Hyun Kim (Department of Family Medicine, Catholic Kwandong University International St. Mary's Hospital, Incheon, South Korea) ; (4) Jin-Yong Jun (Department of Psychiatry, College of Medicine, Ulsan University, Ulsan, South Korea) ; (5) Eunju Kim (Delaware Department of Health and Social Services, Delaware Psychiatric Center, Delaware, USA) ; (6) Min-Seong Koo (Department of Psychiatry, Catholic Kwandong University International St. Mary's Hospital, Incheon, & College of Medicine, Catholic Kwandong University, Gangwon, South Korea — corresponding author and guarantor). The initial OSF registration is filed by three confirmed contributors (Jung Moses Koo, Yong-Tae Kwak, Min-Seong Koo) who jointly lead the PRISMA review process. The remaining provisional contributors (Sun-Hyun Kim, Jin-Yong Jun, Eunju Kim) will be added as OSF Contributors after individual confirmation. ## Registration history an
Koo et al. (Thu,) studied this question.