This paper presents a novel hybrid multimodal deep learning model for robust and real-time eye gaze estimation. Accurate gaze tracking is essential for advancing human–computer interaction (HCI) and assistive technologies, but existing methods often struggle with environmental variations, require extensive calibration, and are computationally intensive. Our proposed model, GazeNet-HM, addresses these limitations by synergistically fusing features from RGB, depth, and infrared (IR) imaging modalities. This multimodal approach allows the model to leverage complementary information: RGB provides rich texture, depth offers invariance to lighting and aids pose estimation, and IR ensures robust pupil detection. Furthermore, we introduce a personalized adaptation module that dynamically fine-tunes the model to individual users with minimal calibration data. To ensure practical deployment, we employ advanced model compression techniques, enabling real-time inference on resource-constrained embedded systems. Extensive evaluations on public datasets (MPIIGaze, EYEDIAP, Gaze360) and our collected M-Gaze dataset demonstrate that GazeNet-HM achieves state-of-the-art performance, reducing the mean angular error by up to 27.1% compared to leading unimodal methods. After model compression, the system achieves a real-time inference speed of 32 FPS on an embedded Jetson Xavier NX platform. Ablation studies confirm the contribution of each modality and component, highlighting the effectiveness of our holistic design.
Building similarity graph...
Analyzing shared references across papers
Loading...
Tatinyuy et al. (Mon,) studied this question.
synapsesocial.com/papers/6971bdcf642b1836717e27e1 — DOI: https://doi.org/10.3390/app16020986
Verdzekov Emile Tatinyuy
University of Douala
Auguste Vigny Noumsi Woguia
University of Douala
Mvogo Ngono Joseph
University of Douala
Applied Sciences
University of Douala
Building similarity graph...
Analyzing shared references across papers
Loading...