In the field of image style transfer, efficient and high-quality style conversion is a hot topic and challenge in research. This article innovatively proposes a guided target style transfer and transformation scheme, which only requires inputting the reference image and the content image to be converted to achieve effective transfer of the content image to the target image style. In terms of model construction, the CLIP model is first used to encode the reference image, extract vector descriptors from multiple style dimensions, and convert them into text embedding vectors to provide style guidance; Then, by using depthwise separable convolution channel to accurately extract the underlying features of the content image, a stable diffusion model is fused to generate a new image. A large number of rigorous experiments have shown that compared with cutting-edge methods, our method performs better in image style transfer tasks, with significant improvements in style similarity, image quality, and other indicators such as SSIM increased by 9.72%, providing new ideas and feasible solutions for target style transfer and promoting its application development in multiple fields.
wang et al. (Thu,) studied this question.