Abstract Ceramic art design faces persistent challenges in balancing cultural authenticity with production efficiency, as conventional workflows rely heavily on subjective artisan expertise. This paper presents a unified framework that integrates multimodal large language models with diffusion probabilistic models for surface-decoration generation and computational visual-aesthetic proxy evaluation of vessel-based ceramic objects, with a dataset predominantly drawn from Chinese ceramic traditions. The framework comprises three coupled modules: a multimodal semantic parsing module that extracts structured design attributes from textual briefs or reference images via instruction-tuned vision-language modeling augmented by a ceramic domain knowledge graph; a style-conditioned diffusion generation network employing cross-attention semantic injection and adaptive layer normalization to synthesize ceramic designs under fine-grained stylistic control; and a multi-dimensional aesthetic quality evaluation model that scores generated outputs across compositional rationality, color harmony, texture fineness, and style consistency using attention-weighted adaptive fusion calibrated against expert judgment. Experiments on a purpose-built dataset of over 21,000 annotated ceramic images spanning five major traditions centered on Chinese porcelain and stoneware demonstrate that the proposed method achieves a FID of 28.35 and an IS of 10.46, substantially outperforming existing baselines. The aesthetic evaluation model attains an SRCC of 0.891 and a PLCC of 0.903 against expert ground truth. We explicitly acknowledge that the framework operates on two-dimensional imagery and does not model three-dimensional form, clay-body or glaze material composition, or firing dynamics; the reported aesthetic scores therefore reflect computational proxies of visual properties rather than holistic aesthetic judgment. Ablation studies confirm that the semantic parsing module constitutes the most critical component, and that multi-dimensional decomposition with adaptive weighting significantly surpasses monolithic scoring approaches.
Wenda Zhao (Thu,) studied this question.