Med-PaLM 2 predicted depression scores from clinical interviews with 80-84% accuracy, yielding scores statistically indistinguishable from human clinical raters.
Does Med-PaLM 2 accurately predict psychiatric functioning and diagnoses from clinical interviews and case descriptions compared to human raters?
Med-PaLM 2 demonstrates the emergent capability to accurately assess depression severity from clinical interviews, though performance varies across other psychiatric conditions like PTSD.
Absolute Event Rate: 8.5% vs 7.94%
p-value: p=0.23
The current work investigates the capability of Large language models (LLMs) that are explicitly trained on large corpuses of medical knowledge (Med-PaLM 2) to predict psychiatric functioning from patient interviews and clinical descriptions without being trained to do so. To assess this, n = 145 depression and n =115 PTSD assessments and n = 46 clinical case studies across high prevalence/high comorbidity disorders (Depressive, Anxiety, Psychotic, trauma and stress, Addictive disorders) were analyzed using prompts to extract estimated clinical scores and diagnoses. Results demonstrate that Med-PaLM 2 is capable of assessing psychiatric functioning across a range of psychiatric conditions with the strongest performance being the prediction of depression scores based on standardized assessments (Accuracy range= 0.80 - 0.84) which were statistically indistinguishable from human clinical raters t(1,144) = 1.20; p = 0.23. Results show the potential for general clinical language models to flexibly predict psychiatric risk based on free descriptions of functioning from both patients and clinicians.
Galatzer‐Levy et al. (Thu,) conducted a other in Psychiatric disorders (Depression, PTSD) (n=306). Med-PaLM 2 vs. Human clinical raters was evaluated on Prediction of depression scores (PHQ-8) (p=0.23). Med-PaLM 2 predicted depression scores from clinical interviews with 80-84% accuracy, yielding scores statistically indistinguishable from human clinical raters.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: