Key points are not available for this paper at this time.
Recently, due to the rapid development of generative AI technologies, the use of AI-generated images has increased significantly, making the distinction between real and fake images crucial. Generative images may be used in various ways such as data training and fast image generation, but a potential for misuse, such as in Deep fake or spreading false information, still exists. This study explores a novel model using the architecture ofSwin-Transformer to distinguish between fake and real images generated based on CNN (Convolutional Neural Networks) and GAN (Generative Adversarial Networks). The Swin-Transformer, a successor model of Vision in Transformer (ViT), applies the structure of the Transformer, which has shown outstanding performance in natural language processing, to the field of images and demonstrates excellent pixel-level segmentation performance. Real and fake images require detailed pixel-level analysis, in which the Swin-Transformer exhibits higher accuracy. Improving the performance of distinguishing between real and fake images is expected to set limits on indiscreet image generation, bringing further effects such as preventing the indiscriminate use of AI images through program-based discrimination/legal sanctions.
Jiyoon Park (Sat,) studied this question.