November 1, 2025

A Two‐Stage Controllable Co‐Speech Gesture Generation Method

Key Points

Competitive performance observed with improvement in diversity of gestures in virtual reality interactions, suggesting enhanced user engagement.
Assessment using a two-stage controllable gesture generation method allows for improved editing and synthesis of motions in real-time settings.
Implementation of advanced models like large language models and diffusion-based approaches enables precise motion control and editing capabilities.
This framework supports greater flexibility in creating realistic gestures for virtual characters, indicating significant advancements in interactive technologies.

Abstract

ABSTRACT Co‐speech gestures generation plays a key role in the field of virtual reality interaction, synthesizing proper digital human's actions passively with speech. The generative algorithm‐based method creates realistic gestures accompanying speech's rhythm and semantic content, improving the interactive experience. To match the persona of digital humans, gestures from algorithms often require additional modification before being applied to a virtual character. However, motion sequences are difficult to edit when generated from hidden motion representations. To make motion synthesis editable, the proposed method develops a two‐stage controllable gesture generation pipeline for the c‐speech ge‐ture generating problem. In stage 1, we design a novel large language model based ‐K‐Decoder th‐t takes speech and style label as input to synthesize inverse ki‐ematic s‐yle control points, which are highly editable. In stage 2, we divide the motion sequence into the body or fingers part for VQ‐based latent motion representation learning relatively. And a diffusion‐based IK‐Denoiser is proposed for'latent motion representation synthesis under the condition of control points. Compared to other representative algorithms, the proposed method gets a competitive performance of metrics such as Fréchet Gesture Distance, Beat Consistency, and Diversity. To demonstrate controllability, it provides three explicit control strategies for motion editing. With these control points, we provide a new co‐speech gesture generation paradigm.

Demander à l'IA

Bookmark

Demander à l'IA

Bookmark

A Two‐Stage Controllable Co‐Speech Gesture Generation Method

Key Points

Abstract

Cite This Study