March 18, 2024Open Access

MDAVIF: A Multi-Domain Acoustical-Visual Information Fusion Model for Depression Recognition from Vlog Data

Key Points

Key points are not available for this paper at this time.

Abstract

With the explosive popularity of social media, more and more people, including those with depressive symptoms, are starting to express their emotions online through vlogs recently, which makes it important for video-based depression recognition. As video data contains rich acoustical and visual information, the main challenges faced by existing methods include (1) how to accurately mine features associated with depression in massive data and (2) how to effectively fuse various features from different modalities. In this paper, a multi-domain acoustical-visual information fusion network (MDAVIF) is designed to extract depressive spatio-temporal features from image sequences and audios, and an adaptive feature interaction module is proposed to mix these features. Combined with two autoencoders to retain information and prevent overfitting, the proposed method obtains the state-of-the-art result with the precision of 74.25% and the F1-Score of 75.25% when evaluated on the D-vlog dataset.

MDAVIF: A Multi-Domain Acoustical-Visual Information Fusion Model for Depression Recognition from Vlog Data

Key Points

Abstract

Cite This Study

Also Consider

Also Consider