• First Mamba-based framework HMamba-3DFT tailored for 3D facial tracking • BSTV-Mamba with BSTS-Scan capture spatiotemporal facial dynamics • Dual optimization integrates dynamic emotion-driven modeling with semantic alignment Monocular video-based 3D face tracking is vital for interactive pattern recognition and human avatars. Most existing image-based methods fail to model temporal dependencies in video, causing jitter and inaccuracies. Furthermore, they also often neglect the continuous multi-modal signals present in facial videos such as expression dynamics and emotional cues that provide essential temporal drivers for facial modeling. To this end, this study first explores the Mamba architecture tailored for 3D facial tracking by proposing a hierarchical Mamba framework, termed HMamba-3DFT. The proposed network can efficiently capture and track variations in 3D facial shapes from a monocular video. To exploit the global spatiotemporal correlations across frames of the dynamic face, we develop a bidirectional spatiotemporal vision Mamba (BSTV-Mamba) module featuring a bidirectional spatiotemporal selective scan (BSTS-Scan) mechanism. To capture temporally evolving multi-modal emotion signals embedded in continuous video sequences, we introduce a dynamic emotion-driven mechanism. Additionally, to mitigate the potential degradation of reconstruction fidelity caused by an over-reliance on emotion-driven cues, we integrate facial semantic alignment with facial emotion driving to enhance the accuracy of emotion-driven facial modeling. This integrated dual-optimization strategy systematically guides the network during training, ensuring that the reconstructed 3D facial mesh not only accurately captures the emotional attributes of the input frames but also benefits from enhanced optimization for more precise reconstruction. Extensive evaluations on benchmark datasets show competitive performance against state-of-the-art methods.
Building similarity graph...
Analyzing shared references across papers
Loading...
Haodong Jin
Muwei Jian
Linyi University
Derui Ding
Pattern Recognition
University of Glasgow
University of Shanghai for Science and Technology
Shandong University of Finance and Economics
Building similarity graph...
Analyzing shared references across papers
Loading...
Jin et al. (Sun,) studied this question.
synapsesocial.com/papers/69aa6f0d531e4c4a9ff5936f — DOI: https://doi.org/10.1016/j.patcog.2026.113415