What question did this study set out to answer?

This study aims to explore the use of deep learning models combining eye-tracking and facial expression data to assess cognitive decline.

April 25, 2026Open Access

Application of Deep Multi-Scale Representation Learning Based on Eye-Tracking and Facial Expression Data in Cognitive Decline Assessment

Key Points

This study aims to explore the use of deep learning models combining eye-tracking and facial expression data to assess cognitive decline.
Designed a deep neural network for multi-scale representation learning from behavioral sequences.
Collected eye-tracking and facial expression data from 20 healthy controls and 20 individuals with cognitive decline.
Utilized multi-dimensional cognitive paradigms to enhance diagnostic robustness through decision fusion.
Achieved 90% classification accuracy, outperforming traditional machine learning models.
Validated several features associated with cognitive decline through statistical analyses.
Identified novel behavioral patterns relevant to cognitive health.

Abstract

Digital biomarkers derived from eye-tracking and facial expression hold significant potential for the non-invasive screening of cognitive decline (CD). However, existing approaches predominantly rely on single-task or feature engineering-based unimodal methods, which struggle to capture complex temporal behavioral patterns. While deep learning (DL) excels at extracting hierarchical features and intricate temporal dynamics from behavioral sequences, its application in this specific multimodal sensing domain remains exploratory. Addressing this gap, this study designed an assessment system integrating five multi-dimensional cognitive paradigms and collected eye-tracking and facial expression data from 20 healthy controls (HC) and 20 individuals with CD. For these multimodal sequences, we propose a deep neural network capable of multi-scale representation learning. By utilizing subspace exploration and multi-scale convolutions, this architecture extracts deep representations directly from data and incorporates a decision fusion mechanism to enhance diagnostic robustness. Experimental results demonstrate that our method achieves a 90% classification accuracy, outperforming machine learning models. Furthermore, statistical analyses conducted in this study validated several features associated with CD and also explored some novel potential behavioral patterns. This study confirms the feasibility of a DL framework based on eye-tracking and facial expression signals for identifying CD, providing a reference for developing objective and efficient digital screening tools.

Application of Deep Multi-Scale Representation Learning Based on Eye-Tracking and Facial Expression Data in Cognitive Decline Assessment

Key Points

Abstract

Cite This Study