What type of study is this?

This is a Experimental Study study.

October 20, 2025Open Access

AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars

Key Points

AsynFusion achieves high-quality, real-time whole-body animations, enhancing the lifelikeness of digital avatars.
The model integrates facial expressions and gestures through a cooperative module, improving animation coordination.
Using diffusion transformers, AsynFusion outperforms existing methods in both quantitative and qualitative evaluations.
Innovative asynchronous sampling reduces computational burden while maintaining animation quality and synchronization.

Abstract

Whole-body audio-driven avatar pose and expression generation is a critical task for creating lifelike digital humans and enhancing the capabilities of interactive virtual agents, with wide-ranging applications in virtual reality, digital entertainment, and remote communication. Existing approaches often generate audio-driven facial expressions and gestures independently, which introduces a significant limitation: the lack of seamless coordination between facial and gestural elements, resulting in less natural and cohesive animations. To address this limitation, we propose AsynFusion, a novel framework that leverages diffusion transformers to achieve harmonious expression and gesture synthesis. The proposed method is built upon a dual-branch DiT architecture, which enables the parallel generation of facial expressions and gestures. Within the model, we introduce a Cooperative Synchronization Module to facilitate bidirectional feature interaction between the two modalities, and an Asynchronous LCM Sampling strategy to reduce computational overhead while maintaining high-quality outputs. Extensive experiments demonstrate that AsynFusion achieves state-of-the-art performance in generating real-time, synchronized whole-body animations, consistently outperforming existing methods in both quantitative and qualitative evaluations.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

T. Zhang

China Telecom (China)

Jian Zhao

Quzhou City People's Hospital

Ye Li

Shandong University of Traditional Chinese Medicine

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider