This study aims to develop an acoustic-feature-based model of client utterances in the DAIC-WoZ dataset, a corpus of clinical interviews collected by the USC Institute for Creative Technologies. DAIC-WoZ contains recorded speech from counseling sessions involving 60 depressed and 133 non-depressed participants, and provides transcriptions, face video, and 11 standard acoustic features (F0, NAQ, QOQ, H1–H2, PSP, MDQ, peakSlope, Rd, MFCC, HMPDM, HMPDD). Additionally, the utterances in the dataset have already been classified according to client-centered therapy principles, which were developed in our previous studies using an LLM-based model with few-shot learning. A two-way ANOVA was conducted for each acoustic feature, using client utterance class (five categories) and depression status (depressed versus non-depressed) as factors. The results showed significant differences in features such as pitch variability and spectral slope across utterance classes, and between depression status groups. Furthermore, significant interaction effects indicated that depression status modulated the acoustic marker differences across the utterance classes. For example, high F0 was observed in depressed participants within the emotional expression class, whereas low F0 was found in depressed participants in the factual information class. These findings are expected to be useful for the development of multimodal counseling AI agents in the future.
Kitamura et al. (Wed,) studied this question.