The paper addresses the reconstruction of three-dimensional facial geometry from a sequence of RGB images. For every frame, a convolutional neural network estimates per-pixel depth, yielding a point-cloud representation of the surface. To convert these discrete observations into a smooth, topologically consistent 3-D model, we adopt a variational formulation that seeks to minimise an energy functional comprising three terms. The data-fidelity term drives the reconstructed surface toward the networkpredicted depth; a geometric-smoothness term penalising surface curvature to suppress noise and preserve detail; and a temporal-coherence term that discourages spurious frameto-frame jitter while allowing genuine facial motion. Surface discretisation is performed by a modified three-dimensional Delaunay triangulation that preserves local geometric relations and provides Lipschitz-consistent error bounds for the data term. Optimization proceeds via explicit integration of a discrete Laplacian flow on the mesh vertices; this local operator obviates the need for global sparse linear system resolution while maintaining linear computational complexity per video frame. Experiments on the datasets confirm that the proposed approach consistently reduces geometric error and mitigates temporal artifacts relative to classical keypoint-based and purely neural methods, albeit at the cost of a slightly denser mesh and the need to tune the regularization weight.
Modenko et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: