What does this research mean for the field?

The improved Transformer-based dance movement generation method enhances the naturalness, fluency, and controllability of generated dance movements. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to improve the quality of generated dance movements, focusing on controllability, rhythm, and style.

February 26, 2026

The Dance Movement Generation Method Based on an Improved Transformer

Key Points

The aim is to improve the quality of generated dance movements, focusing on controllability, rhythm, and style.
Developed a motion-aware self-attention mechanism
Designed a dual-stream structure for pose and motion
Introduced a cross-modal music conditioning module
Applied inverse kinematics and energy constraints
Employed hierarchical temporal modeling and semi-supervised training
Outperformed baseline models in multiple metrics like Frechet Inception Distance
Achieved higher accuracy in dance style classification
Generated more diverse and continuous dance sequences
Enhanced motion naturalness and temporal consistency
Improved controllability over dance styles

Abstract

Existing dance movement generation methods still exhibit significant deficiencies in controllability, rhythmic consistency, style retention, and long-term temporal dependency modeling. These drawbacks limit their practical deployment in applications such as virtual human driving and digital content generation. To address the aforementioned research gaps, this study proposes an improved Transformer-based dance movement generation method, aiming to enhance their naturalness, fluency, and controllability. First, this study constructs a motion-aware self-attention mechanism, which strengthens the model's ability to capture local dynamic changes by introducing temporal motion weights. Second, a dual-stream structure consisting of pose and motion streams is designed to realize joint modeling of spatial and temporal features. In addition, a cross-modal music conditioning module is introduced to align generated movements with rhythm, energy, and emotional tension. In combination with inverse kinematics and energy constraints, the physical plausibility of movements is further improved. The model also enhances generation stability through hierarchical temporal modeling and semi-supervised training. Experimental results show that the proposed method consistently outperforms baseline models across indicators, including Frechet Inception Distance, Perceptual Evaluation of Motion Quality, Motion Diversity Score, and Speed and Acceleration Consistency. It also achieves higher accuracy, precision, and recall in dance style classification tasks. These results indicate that the model can effectively capture motion style features and generate continuous and diverse dance sequences. The generation framework proposed in this study achieves a favorable balance among motion naturalness, temporal consistency, and style controllability. It can be applied to scenarios such as virtual digital human movement generation, dance creation assistance, and interactive immersive systems, providing a practically valuable technical pathway for automated dance content generation.

Bookmark

Cite This Study

Xie et al. (Wed,) studied this question.

synapsesocial.com/papers/699fe37b95ddcd3a253e7526 https://doi.org/https://doi.org/10.1142/s0219519426400415

Bookmark