What question did this study set out to answer?

The aim is to enhance controllable symbolic music generation by preserving reference melodies while responding to style prompts.

June 10, 2026Open Access

Controllable Symbolic Music Generation via Stage-Aware Style Routing and Differentiable Melody Regularization

Key Points

The aim is to enhance controllable symbolic music generation by preserving reference melodies while responding to style prompts.
Developed HCDMG++, a hierarchical diffusion framework utilizing stage-aware style routing and differentiable melody regularization.
Employed residual multi-layer perceptron with scalar gates for style embedding projections into specific subspaces.
Evaluated the system on a benchmark with 384 samples encompassing various melodies and styles.
HCDMG++ generated valid four-track outputs consistently across all 384 runs.
Achieved a peak pitch histogram similarity score of 0.508 under a 64-step budget.
Improved pitch histogram alignment over Legacy-HCDMG by approximately two orders of magnitude.

Abstract

Controllable symbolic music generation must preserve a reference melody while remaining responsive to style prompts. Existing hierarchical diffusion systems typically reuse a shared condition vector across harmony, rhythm, and timbre stages, which can entangle stylistic factors and weaken melody preservation. We present HCDMG++, a hierarchical diffusion framework that addresses these two limitations through stage-aware style routing and differentiable melody regularization. The routing module uses a residual multi-layer perceptron (MLP) with zero-initialized scalar gates to project text-derived style embeddings into harmony-, rhythm-, and timbre-specific subspaces, whereas the regularization branch aligns soft pitch histograms and contour trajectories with the conditioning melody during training without breaking the differentiable computation graph. We evaluate the integrated system on a 384-sample benchmark covering four melodies, eight styles, four random seeds, and three denoising budgets, supplemented by a matched legacy-compatible reference and inference-time component ablation that contrasts legacy behavior, silenced gates, an automated uniform gamma routing sweep, and the full forward pass . HCDMG++ produces valid four-track outputs in all 384 runs, reaches a peak pitch histogram similarity score of 0.508 under a 64-step budget, and improves pitch histogram alignment over Legacy-HCDMG by roughly two orders of magnitude on the matched slice, while attaining a positive Fisher-style style separability score where the legacy benchmark is too sparse to support one. These results indicate that stage-specific conditioning and differentiable structural guidance jointly improve controllability in symbolic music diffusion, while also exposing the remaining limitations in long-form generalization and perceptual validation, which motivate the future work outlined at the end of this paper.

Read Full Paperexternally

AI से पूछें

Bookmark

View Full Paper

Cite This Study

Zhou et al. (Mon,) studied this question.

synapsesocial.com/papers/6a28fecb6f82f25be989bf0b https://doi.org/https://doi.org/10.3390/info17060568

AI से पूछें

Bookmark

View Full Paper