What type of study is this?

This is a Experimental Study study.

October 5, 2025Open Access

Skeleton-Guided Diffusion for Font Generation

Key Points

A skeleton-guided diffusion model achieved significant improvements in font generation quality.
Key innovations include a skeleton-constrained style rendering module enhancing style consistency.
The method effectively integrates multi-scale glyph skeleton information to prevent structural distortions.
Contrastive learning is used to refine style representations, aiding in disambiguation of similar styles.

Abstract

Generating non-standard fonts, such as running script (e.g., XingShu), poses significant challenges due to their high stroke continuity, structural flexibility, and stylistic diversity, which traditional component-based prior knowledge methods struggle to model effectively. While diffusion models excel at capturing continuous feature spaces and stroke variations through iterative denoising, they face critical limitations: (1) style leakage, where large stylistic differences lead to inconsistent outputs due to noise interference; (2) structural distortion, caused by the absence of explicit structural guidance, resulting in broken strokes or deformed glyphs; and (3) style confusion, where similar font styles are inadequately distinguished, producing ambiguous results. To address these issues, we propose a novel skeleton-guided diffusion model with three key innovations: (1) a skeleton-constrained style rendering module that enforces semantic alignment and balanced energy constraints to amplify critical skeletal features, mitigating style leakage and ensuring stylistic consistency; (2) a cross-scale skeleton preservation module that integrates multi-scale glyph skeleton information through cross-dimensional interactions, effectively modeling macro-level layouts and micro-level stroke details to prevent structural distortions; (3) a contrastive style refinement module that leverages skeleton decomposition and recombination strategies, coupled with contrastive learning on positive and negative samples, to establish robust style representations and disambiguate similar styles. Extensive experiments on diverse font datasets demonstrate that our approach significantly improves the generation quality, achieving superior style fidelity, structural integrity, and style differentiation compared to state-of-the-art diffusion-based font generation methods.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper