Currently, Image Generation Technology (IGT) based on Artificial Intelligence (AI) and Deep Learning (DL) has demonstrated enormous potential in the field of artistic creation. However, it still has obvious shortcomings in the precise control of artistic style and the fidelity of high-resolution output. To address the existing issues of AI IGT in artistic creation, including inaccurate style control, limited resolution, and loss of artistic texture during Super-Resolution (SR) processing, this study proposes an innovative framework named StyleDiffusion-HD. The framework integrates a Latent Diffusion Model (LDM) based on Style Injection Attention (SIA) to achieve precise bimodal control of text and visual style. It also introduces an SR module based on Flow Matching (FM), which improves image resolution while maintaining style consistency. Experiments are conducted using multi-source high-quality artistic datasets, with evaluations performed from multiple dimensions, including generation quality, style consistency, image-text alignment, and subjective aesthetics. Experimental results show that the proposed method outperforms mainstream models on objective metrics including Fréchet Inception Distance (FID), CLIP Score (CS), and Style Loss (SL), and achieves high scores in subjective evaluations by experts and the general public, verifying its effectiveness and practicability in improving the artistic presentation of images. This study provides a feasible technical path to address the key challenges in current AI art generation, and offers practical references for the development of high-quality AI-assisted artistic creation.
Gao et al. (Wed,) studied this question.