What type of study is this?

This is a Quantitative Study study.

November 2, 2025Open Access

Multimodal Depression Recognition Based on Sentence-level Dynamic Multimodal Split Attention Fusion

Key Points

MTFNet achieves 86% accuracy on DAIC-WOZ dataset, indicating effective depression recognition.
The study introduces a novel sentence-level dynamic multimodal attention fusion for better modal interaction.
This analysis utilizes a multimodal temporal fusion network approach to address limitations in existing methods.
These findings imply that improved recognition methods may aid in early clinical diagnosis of depression.

Abstract

Depression is a common yet highly covert mental disorder, making the development of efficient intelligent recognition methods crucial for early screening and clinical diagnostic support. Existing multimodal depression recognition approaches still face limitations in modal interaction and long-sequence semantic modeling, struggling to fully capture local dynamics and cross-modal dependencies. To address this, this study proposes a multimodal temporal fusion network. This approach first divides long medical interview sequences into sentence-level units based on timestamps to mitigate information dilution in lengthy sequences. Subsequently, it designs a sentence-level dynamic multimodal attention fusion module. This module further segments sentence sequences into contiguous segments and adaptively emphasizes key modal features while suppressing redundant and noisy information through dynamic weight allocation. On the public dataset DAIC-WOZ and the self-built Chinese dataset MDD2025, MTFNet achieves accuracy rates of 86% and 84%, respectively.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Sun et al. (Fri,) studied this question.

synapsesocial.com/papers/6906a3a98b61f987b17a0110 https://doi.org/https://doi.org/10.54097/zs7j8602

Bookmark

View Full Paper