What question did this study set out to answer?

The aim is to develop a multi-task method for detecting depression and Parkinson’s disease from video data, enhancing interpretability and accuracy.

March 18, 2026Open Access

DEPART: Multi-Task Interpretable Depression and Parkinson’s Disease Detection from In-the-Wild Video Data

Key Points

The aim is to develop a multi-task method for detecting depression and Parkinson’s disease from video data, enhancing interpretability and accuracy.
Utilized a multi-task learning framework for simultaneous detection of depression and Parkinson’s disease.
Implemented body region extraction, CLIP-based visual encoding, and transformer-based temporal modeling.
Employed prototype-aware classification and gated fusion techniques for improved predictions.
Created gradient-based attention maps for visualizing significant regions impacting predictions.
Achieved Recall of 82.39% for depression and 78.20% for Parkinson’s disease using the multi-task approach.
Improved Recall for depression from 82.39% to 87.50% and for Parkinson’s disease from 78.20% to 86.14% after cleaning test data.
Initially increased false positives for healthy individuals due to annotation mismatches and static content misinterpretation.

Abstract

Automated video-based detection of cognitive disorders can enable a scalable non-invasive health monitoring. However, existing methods focus on a single disease and provide limited interpretability, whereas real-world videos often contain co-occurring conditions. We propose a novel unified multi-task method to detect depression and Parkinson’s disease (PD) from in-the-wild video data called DEPART (DEpression and PArkinson’s Recognition Technique). It performs body region extraction, Contrastive Language-Image Pre-training (CLIP)-based visual encoding, Transformer-based temporal modeling, and prototype-aware classification with a gated fusion technique. Gradient-based attention maps are used to visualize task-specific regions that drive predictions. Experiments on the In-the-Wild Speech Medical (WSM) corpus demonstrate competitive performance: the multi-task model achieves Recall of 82.39% for depression and 78.20% for PD, compared with 87.76% and 78.20%, for the best single-task models. The multi-task learning initially increases false positives for healthy persons in the PD subset, mainly due to annotation–modality mismatches, static visual content misinterpreted as motor impairments, and occasional body detection failures. After cleaning the test data, Recall for healthy individuals becomes comparable across models; the multi-task model improves Recall for both depression (from 82.39% to 87.50%) and PD (from 78.20% to 86.14%), suggesting better robustness for real-life clinical applications.

DEPART: Multi-Task Interpretable Depression and Parkinson’s Disease Detection from In-the-Wild Video Data

Key Points

Abstract

Cite This Study