What question did this study set out to answer?

The research aims to develop an effective eye gaze detection model that operates in real-time and adapts to individual users.

January 22, 2026Open Access

Eye Gaze Detection Using a Hybrid Multimodal Deep Learning Model for Assistive Technology

Key Points

The research aims to develop an effective eye gaze detection model that operates in real-time and adapts to individual users.
Developed a hybrid multimodal deep learning model, GazeNet-HM.
Integrated features from RGB, depth, and infrared imaging.
Implemented a personalized adaptation module for minimal user calibration.
Used model compression techniques for efficient real-time performance.
Evaluated on public datasets and a newly created dataset, M-Gaze.
Achieved state-of-the-art performance with up to 27.1% reduction in mean angular error.
Real-time inference speed of 32 frames per second on a Jetson Xavier NX.
Ablation studies validated the importance of each modality and design component.

Abstract

This paper presents a novel hybrid multimodal deep learning model for robust and real-time eye gaze estimation. Accurate gaze tracking is essential for advancing human–computer interaction (HCI) and assistive technologies, but existing methods often struggle with environmental variations, require extensive calibration, and are computationally intensive. Our proposed model, GazeNet-HM, addresses these limitations by synergistically fusing features from RGB, depth, and infrared (IR) imaging modalities. This multimodal approach allows the model to leverage complementary information: RGB provides rich texture, depth offers invariance to lighting and aids pose estimation, and IR ensures robust pupil detection. Furthermore, we introduce a personalized adaptation module that dynamically fine-tunes the model to individual users with minimal calibration data. To ensure practical deployment, we employ advanced model compression techniques, enabling real-time inference on resource-constrained embedded systems. Extensive evaluations on public datasets (MPIIGaze, EYEDIAP, Gaze360) and our collected M-Gaze dataset demonstrate that GazeNet-HM achieves state-of-the-art performance, reducing the mean angular error by up to 27.1% compared to leading unimodal methods. After model compression, the system achieves a real-time inference speed of 32 FPS on an embedded Jetson Xavier NX platform. Ablation studies confirm the contribution of each modality and component, highlighting the effectiveness of our holistic design.

Read Full Paperexternally

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Journals

Applied Sciences

Institutions

University of Douala

References and Citations

Add This Paper to Your Research Feed

Any time a new paper drops it will be there.