Controllable symbolic music generation must preserve a reference melody while remaining responsive to style prompts. Existing hierarchical diffusion systems typically reuse a shared condition vector across harmony, rhythm, and timbre stages, which can entangle stylistic factors and weaken melody preservation. We present HCDMG++, a hierarchical diffusion framework that addresses these two limitations through stage-aware style routing and differentiable melody regularization. The routing module uses a residual multi-layer perceptron (MLP) with zero-initialized scalar gates to project text-derived style embeddings into harmony-, rhythm-, and timbre-specific subspaces, whereas the regularization branch aligns soft pitch histograms and contour trajectories with the conditioning melody during training without breaking the differentiable computation graph. We evaluate the integrated system on a 384-sample benchmark covering four melodies, eight styles, four random seeds, and three denoising budgets, supplemented by a matched legacy-compatible reference and inference-time component ablation that contrasts legacy behavior, silenced gates, an automated uniform gamma routing sweep, and the full forward pass . HCDMG++ produces valid four-track outputs in all 384 runs, reaches a peak pitch histogram similarity score of 0.508 under a 64-step budget, and improves pitch histogram alignment over Legacy-HCDMG by roughly two orders of magnitude on the matched slice, while attaining a positive Fisher-style style separability score where the legacy benchmark is too sparse to support one. These results indicate that stage-specific conditioning and differentiable structural guidance jointly improve controllability in symbolic music diffusion, while also exposing the remaining limitations in long-form generalization and perceptual validation, which motivate the future work outlined at the end of this paper.
Zhou et al. (Mon,) studied this question.