What question did this study set out to answer?

The aim is to create lifelike talking avatars that maintain user identity privacy in digital environments.

January 16, 2026Open Access

SynthFace: Generating Secure Talking Avatars from Controlled Target Faces

Key Points

The aim is to create lifelike talking avatars that maintain user identity privacy in digital environments.
Developed a key-conditioned synthetic identity generator to create unique, non-invertible face representations.
Extracted facial expressions and head poses from a source video using 3D Morphable Models.
Synchronized lip and head movements with a mel-spectrogram driven generator for realistic animation.
Evaluated identity protection, expression fidelity, and perceptual quality through various metrics.
SynthFace achieves high expression realism while successfully suppressing real-world identity.
The method displays consistent reproducibility and unlinkability to real individuals.
Experimental evaluations indicate strong performance in identity protection and expression fidelity.

Abstract

In the era of digital communication, the demand for expressive, lifelike avatars is rapidly increasing especially in domains where user identity privacy is paramount.We introduce SynthFace, a novel and privacy-preserving framework for generating realistic talking avatars that faithfully convey user expressions while safeguarding their real-world identity. Unlike conventional methods that depend on GAN-generated samples or real facial imagery, SynthFace employs a key-conditioned synthetic identity generator, which deterministically produces a unique, non-invertible face representation. This identity is derived from a cryptographic hash of a secret key and a reference image, ensuring both unlinkability to any real individual and consistent reproducibility. To animate the avatar, we extract facial expressions and head pose from a source video using 3D Morphable Models (3DMMs), and seamlessly fuse these with the synthetic identity at the coefficient level. A mel-spectrogram driven expression generator then synthesizes temporally coherent lip and head movements, enabling accurate speech-driven animation. We rigorously evaluate SynthFace across multiple dimensions: identity protection using ArcFace Cosine Similarity and SSIM, expression fidelity using a Facial Expression Recognition (FER) classifier, GradSim Similarity, and Dlib’s 68- point landmark distances, and perceptual quality via PSNR and optical flow-based temporal consistency. Experimental results and qualitative visualizations demonstrate that SynthFace achieves high expression realism while offering strong identity suppression, positioning it as a powerful tool for privacy-sensitive applications such as teletherapy, virtual education, and secure online communication.

SynthFace: Generating Secure Talking Avatars from Controlled Target Faces

Key Points

Abstract

Cite This Study