Generating non-standard fonts, such as running script (e.g., XingShu), poses significant challenges due to their high stroke continuity, structural flexibility, and stylistic diversity, which traditional component-based prior knowledge methods struggle to model effectively. While diffusion models excel at capturing continuous feature spaces and stroke variations through iterative denoising, they face critical limitations: (1) style leakage, where large stylistic differences lead to inconsistent outputs due to noise interference; (2) structural distortion, caused by the absence of explicit structural guidance, resulting in broken strokes or deformed glyphs; and (3) style confusion, where similar font styles are inadequately distinguished, producing ambiguous results. To address these issues, we propose a novel skeleton-guided diffusion model with three key innovations: (1) a skeleton-constrained style rendering module that enforces semantic alignment and balanced energy constraints to amplify critical skeletal features, mitigating style leakage and ensuring stylistic consistency; (2) a cross-scale skeleton preservation module that integrates multi-scale glyph skeleton information through cross-dimensional interactions, effectively modeling macro-level layouts and micro-level stroke details to prevent structural distortions; (3) a contrastive style refinement module that leverages skeleton decomposition and recombination strategies, coupled with contrastive learning on positive and negative samples, to establish robust style representations and disambiguate similar styles. Extensive experiments on diverse font datasets demonstrate that our approach significantly improves the generation quality, achieving superior style fidelity, structural integrity, and style differentiation compared to state-of-the-art diffusion-based font generation methods.
Li et al. (Fri,) studied this question.