Generating continuous and expressive human motion from textual descriptions is a critical challenge in applications such as gaming and filmmaking. Existing methods often struggle to maintain global coherence, realistic frame continuity, and smooth transitions. To address these limitations, we propose FCMD, a novel diffusion-based model for generating cohesive motion sequences from fine-grained textual descriptions. FCMD introduces three key innovations: (1) Fine-grained Text Fusion, which integrates detailed textual cues with transitional narratives to enhance semantic consistency; (2) History Motion Guidance, ensuring motion accuracy and consistency across consecutive frames; and (3) Smooth Stitching Sampling, which leverages preceding and current motion information to achieve seamless transitions. Additionally, FCMD employs a large language model (LLM) to refine motion datasets by extracting fine-grained textual descriptions. Extensive experiments demonstrate that FCMD outperforms state-of-the-art methods in generating coherent, natural, and highly controllable motion sequences.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shuai Li
Siqi Wang
Xinyu Zhang
IEEE Transactions on Visualization and Computer Graphics
Beihang University
Beijing Information Science & Technology University
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69c8c195de0f0f753b39be8d — DOI: https://doi.org/10.1109/tvcg.2026.3677594