Synthesizing anatomically plausible facial expressions for embodied avatars requires bridging the gap between high-level semantic intent and low-level physical constraints. This study presents a unified architecture that establishes a “Semantic-Kinematic Loop,” explicitly coupling FACS-based control with biomechanical regularization. Unlike black-box neural renderers or purely geometric BlendShape systems, our framework employs a multi-stage pipeline: semantic intent is first mapped to Action Units (AUs), which then drive a coarse linear deformation, followed by a fine grained refinement stage using a topology-aware Inverse Kinematics (IK) solver. This solver enforces segment length constraints and inter-region coupling, effectively translating abstract affective signals into physically grounded surface deformations. Furthermore, the framework exploits this kinematic structure to enable controlled perturbation strategies, facilitating the generation of diverse, anatomically valid synthetic training data. The experimental results indicate that this hybrid approach effectively eliminates surface tearing artifacts and achieves superior anatomical fidelity in reproducing complex emotional states.
Wang et al. (Wed,) studied this question.