What question did this study set out to answer?

The aim is to enable photorealistic facial animation on low-end devices without relying on cloud computing or high-end GPUs.

April 11, 2026Open Access

LiveFace: Real-Time Photorealistic Facial Animation on Low-End Mobile Devices via Compact Per-Avatar Neural Decoders and Universal Compositor-Upscaler

Leer artículo completoexternamente

Puntos clave

The aim is to enable photorealistic facial animation on low-end devices without relying on cloud computing or high-end GPUs.
Developed a modular neural rendering system called LiveFace.
Implemented a decomposed per-avatar decoder architecture for facial regions.
Created a universal compositor-upscaler for final image processing.
Utilized a video-driven knowledge distillation pipeline for training the system.
Achieved a real-time animation rate of 30 fps on low-end mobile hardware.
System operates with approximately 20 million parameters and 19 ms latency per frame.
Maintained high realism in animations compared to existing solutions.

Resumen

We present LiveFace, a modular neural rendering system that achieves photorealistic talking-head animation at 30 fps on low-end mobile devices with as little as ~10 GFLOPS of compute (e.g., Qualcomm Snapdragon 439). Prior photorealistic facial animation systems either require cloud infrastructure with 100M+ parameter models (HeyGen, D-ID, Synthesia) or demand desktop-class GPUs (MetaHuman, Audio2Face), while on-device alternatives sacrifice realism for stylized cartoon aesthetics (Apple Memoji, Samsung AR Emoji). LiveFace bridges this gap through three key contributions: (1) a decomposed per-avatar decoder architecture that factorizes the face into four independently rendered regions — mouth, eyes, hair, and body — each handled by a compact neural decoder augmented with a 128-dimensional learnable identity embedding; (2) a universal compositor-upscaler (~7M parameters) shared across all avatars that composites the decoded patches onto a 9:16 portrait canvas and upscales to display resolution in a single forward pass; and (3) a video-driven knowledge distillation pipeline that uses RAVDESS emotional speech videos as driving sources for LivePortrait to generate diverse, naturalistic training data for the student decoders. The full system comprises ~20M INT8 parameters with a total inference latency of ~19 ms per frame, enabling real-time, fully offline operation on commodity mobile hardware without any cloud dependency.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Dmitry Rodin

Nikita Rodin

Texas Tech University

Actions

Institutions

Texas Tech University

Code Creator (Czechia)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

LiveFace: Real-Time Photorealistic Facial Animation on Low-End Mobile Devices via Compact Per-Avatar Neural Decoders and Universal Compositor-Upscaler

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study