Los puntos clave no están disponibles para este artículo en este momento.
Online social networks contain a wealth of multimodal data with text and images, providing a promising new opportunity for the early detection of depression. Most existing depression detection methods rely on single modality or simply splice different modalities. Besides, the methods proposed are often lack modality alignment for long time series tweets of user. In response to the above issues, we propose a novel model called Cross-Attention Transformer for Depression Detection (CATDD), which incorporates two modules Textual Feature Extraction (TFE) and Visual Feature Extraction (VFE) to extracts the features of text and image based on text and image data from each individual tweet of user. Apart from that, we devise a Text Based Cross-Attention Fusion (TBCAF) module to effectively fuse the two modalities and use a double layer Bi-LSTM to obtain the long time series representation of each user. The experimental results on a multimodal depression dataset show that the proposed method outperforms the baseline models.
Zhao et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: