We present LiveFace, a modular neural rendering system that achieves photorealistic talking-head animation at 30 fps on low-end mobile devices with as little as ~10 GFLOPS of compute (e.g., Qualcomm Snapdragon 439). Prior photorealistic facial animation systems either require cloud infrastructure with 100M+ parameter models (HeyGen, D-ID, Synthesia) or demand desktop-class GPUs (MetaHuman, Audio2Face), while on-device alternatives sacrifice realism for stylized cartoon aesthetics (Apple Memoji, Samsung AR Emoji). LiveFace bridges this gap through three key contributions: (1) a decomposed per-avatar decoder architecture that factorizes the face into four independently rendered regions — mouth, eyes, hair, and body — each handled by a compact neural decoder augmented with a 128-dimensional learnable identity embedding; (2) a universal compositor-upscaler (~7M parameters) shared across all avatars that composites the decoded patches onto a 9:16 portrait canvas and upscales to display resolution in a single forward pass; and (3) a video-driven knowledge distillation pipeline that uses RAVDESS emotional speech videos as driving sources for LivePortrait to generate diverse, naturalistic training data for the student decoders. The full system comprises ~20M INT8 parameters with a total inference latency of ~19 ms per frame, enabling real-time, fully offline operation on commodity mobile hardware without any cloud dependency.
Building similarity graph...
Analyzing shared references across papers
Loading...
Dmitry Rodin
Nikita Rodin
Texas Tech University
Texas Tech University
Code Creator (Czechia)
Building similarity graph...
Analyzing shared references across papers
Loading...
Rodin et al. (Thu,) studied this question.
synapsesocial.com/papers/69d9e67a78050d08c1b76dcc — DOI: https://doi.org/10.5281/zenodo.19477081