December 23, 2024Open Access

Harnessing multimodal approaches for depression detection using large language models and facial expressions

Key Points

Key points are not available for this paper at this time.

Abstract

Detecting depression is a critical component of mental health diagnosis, and accurate assessment is essential for effective treatment. This study introduces a novel, fully automated approach to predicting depression severity using the E-DAIC dataset. We employ Large Language Models (LLMs) to extract depression-related indicators from interview transcripts, utilizing the Patient Health Questionnaire-8 (PHQ-8) score to train the prediction model. Additionally, facial data extracted from video frames is integrated with textual data to create a multimodal model for depression severity prediction. We evaluate three approaches: text-based features, facial features, and a combination of both. Our findings show the best results are achieved by enhancing text data with speech quality assessment, with a mean absolute error of 2.85 and root mean square error of 4.02. This study underscores the potential of automated depression detection, showing text-only models as robust and effective while paving the way for multimodal analysis.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Misha Sadeghi

Robert Richer

Bernhard Egger

Journals

SHILAP Revista de lepidopterología

npj Mental Health Research

Actions

Institutions

Friedrich-Alexander-Universität Erlangen-Nürnberg

Helmholtz Zentrum München

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Harnessing multimodal approaches for depression detection using large language models and facial expressions

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study