What question did this study set out to answer?

The aim is to develop a model based on acoustic features of client utterances in counseling dialogues, focusing on variations related to depression status.

May 14, 2026

Acoustic feature differences across client utterance classes in counseling dialogues

Key Points

The aim is to develop a model based on acoustic features of client utterances in counseling dialogues, focusing on variations related to depression status.
Utilized the DAIC-WoZ dataset with recorded speech from clinical interviews of depressed and non-depressed participants.
Classified utterances based on client-centered therapy principles using an LLM-based model with few-shot learning.
Conducted two-way ANOVA to assess differences in acoustic features across utterance classes and depression status.
Significant differences in acoustic features like pitch variability and spectral slope across utterance classes.
Depression status significantly influenced acoustic marker differences among utterance classes.
High F0 in depressed individuals during emotional expression and low F0 in factual information class.

Abstract

This study aims to develop an acoustic-feature-based model of client utterances in the DAIC-WoZ dataset, a corpus of clinical interviews collected by the USC Institute for Creative Technologies. DAIC-WoZ contains recorded speech from counseling sessions involving 60 depressed and 133 non-depressed participants, and provides transcriptions, face video, and 11 standard acoustic features (F0, NAQ, QOQ, H1–H2, PSP, MDQ, peakSlope, Rd, MFCC, HMPDM, HMPDD). Additionally, the utterances in the dataset have already been classified according to client-centered therapy principles, which were developed in our previous studies using an LLM-based model with few-shot learning. A two-way ANOVA was conducted for each acoustic feature, using client utterance class (five categories) and depression status (depressed versus non-depressed) as factors. The results showed significant differences in features such as pitch variability and spectral slope across utterance classes, and between depression status groups. Furthermore, significant interaction effects indicated that depression status modulated the acoustic marker differences across the utterance classes. For example, high F0 was observed in depressed participants within the emotional expression class, whereas low F0 was found in depressed participants in the factual information class. These findings are expected to be useful for the development of multimodal counseling AI agents in the future.

Mark Helpful

Bookmark

Relay