What question did this study set out to answer?

The aim is to develop a framework for dynamic 3D facial reconstruction using only sequential landmarks as input.

June 17, 2026

FaceNeRF--: Dynamic 3D Facial Reconstruction and Manipulation From Sequential Landmarks

Key Points

The aim is to develop a framework for dynamic 3D facial reconstruction using only sequential landmarks as input.
Integrated landmark observations into neural representations
Implemented hypothetical projecting rays (HPRs) for ray direction estimation
Developed masked hierarchical sampling (Masked-HS) to separate head pose from facial expressions.
Achieved high-quality dynamic face reconstruction and head pose prediction
Supported applications such as real-time reenactment and expression editing
Demonstrated significant reduction in data acquisition requirements while maintaining synthesis performance.

Abstract

Recent advances in 3D human face reconstruction have demonstrated remarkable progress in rendering quality and realism. However, most existing methods critically depend on precise prior knowledge, such as camera intrinsic and extrinsic parameters as well as detailed facial expression annotations, which are costly and impractical to acquire in unconstrained environments. This limitation severely hinders their applicability in real-world scenarios. To address this challenge, we present FaceNeRF--, a novel framework designed for dynamic 3D facial reconstruction and manipulation that requires only sequential facial landmarks as input. By integrating lightweight landmark observations into implicit neural representations, FaceNeRF-- is able to simultaneously estimate head pose and synthesize photorealistic face images, which eliminates the reliance on camera calibration or predefined expression models. Our approach introduces two key innovations. First, we propose hypothetical projecting rays (HPRs), which enable the estimation of ray directions directly from predicted head poses, thereby enabling accurate volumetric rendering without known camera parameters. Second, we develop a masked hierarchical sampling (Masked-HS) strategy that effectively disentangles head pose from facial expressions, allowing the model to avoid overfitting to landmark inputs and to learn a more robust representation of dynamic facial geometry. Together, these techniques form a unified pipeline capable of self-supervised training, efficient inference, and explicit editing of facial expressions and head orientations. Extensive experiments on diverse in-the-wild datasets demonstrate that FaceNeRF-- achieves high-quality dynamic face reconstruction and accurate head pose prediction. In addition, our method supports practical downstream applications, including real-time reenactment, pose manipulation, and expression editing, highlighting its versatility and scalability. Overall, FaceNeRF-- provides a lightweight yet powerful solution for dynamic 3D face modeling, significantly lowering the requirements for data acquisition while maintaining photorealistic synthesis performance.

Bookmark

FaceNeRF--: Dynamic 3D Facial Reconstruction and Manipulation From Sequential Landmarks

Key Points

Abstract

Cite This Study